-
Numerically robust Gaussian state estimation with singular observation noise
Authors:
Nicholas Krämer,
Filip Tronarp
Abstract:
This article proposes numerically robust algorithms for Gaussian state estimation with singular observation noise. Our approach combines a series of basis changes with Bayes' rule, transforming the singular estimation problem into a nonsingular one with reduced state dimension. In addition to ensuring low runtime and numerical stability, our proposal facilitates marginal-likelihood computations an…
▽ More
This article proposes numerically robust algorithms for Gaussian state estimation with singular observation noise. Our approach combines a series of basis changes with Bayes' rule, transforming the singular estimation problem into a nonsingular one with reduced state dimension. In addition to ensuring low runtime and numerical stability, our proposal facilitates marginal-likelihood computations and Gauss-Markov representations of the posterior process. We analyse the proposed method's computational savings and numerical robustness and validate our findings in a series of simulations.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements
Authors:
Nicholas Krämer
Abstract:
Despite substantial progress in recent years, probabilistic solvers with adaptive step sizes can still not solve memory-demanding differential equations -- unless we care only about a single point in time (which is far too restrictive; we want the whole time series). Counterintuitively, the culprit is the adaptivity itself: Its unpredictable memory demands easily exceed our machine's capabilities,…
▽ More
Despite substantial progress in recent years, probabilistic solvers with adaptive step sizes can still not solve memory-demanding differential equations -- unless we care only about a single point in time (which is far too restrictive; we want the whole time series). Counterintuitively, the culprit is the adaptivity itself: Its unpredictable memory demands easily exceed our machine's capabilities, making our simulations fail unexpectedly and without warning. Still, dropping adaptivity would abandon years of progress, which can't be the answer. In this work, we solve this conundrum. We develop an adaptive probabilistic solver with fixed memory demands building on recent developments in robust state estimation. Switching to our method (i) eliminates memory issues for long time series, (ii) accelerates simulations by orders of magnitude through unlocking just-in-time compilation, and (iii) makes adaptive probabilistic solvers compatible with scientific computing in JAX.
△ Less
Submitted 3 July, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Numerically Robust Fixed-Point Smoothing Without State Augmentation
Authors:
Nicholas Krämer
Abstract:
Practical implementations of Gaussian smoothing algorithms have received a great deal of attention in the last 60 years. However, almost all work focuses on estimating complete time series (''fixed-interval smoothing'', $\mathcal{O}(K)$ memory) through variations of the Rauch--Tung--Striebel smoother, rarely on estimating the initial states (''fixed-point smoothing'', $\mathcal{O}(1)$ memory). Sin…
▽ More
Practical implementations of Gaussian smoothing algorithms have received a great deal of attention in the last 60 years. However, almost all work focuses on estimating complete time series (''fixed-interval smoothing'', $\mathcal{O}(K)$ memory) through variations of the Rauch--Tung--Striebel smoother, rarely on estimating the initial states (''fixed-point smoothing'', $\mathcal{O}(1)$ memory). Since fixed-point smoothing is a crucial component of algorithms for dynamical systems with unknown initial conditions, we close this gap by introducing a new formulation of a Gaussian fixed-point smoother. In contrast to prior approaches, our perspective admits a numerically robust Cholesky-based form (without downdates) and avoids state augmentation, which would needlessly inflate the state-space model and reduce the numerical practicality of any fixed-point smoother code. The experiments demonstrate how a JAX implementation of our algorithm matches the runtime of the fastest methods and the robustness of the most robust techniques while existing implementations must always sacrifice one for the other.
△ Less
Submitted 23 January, 2025; v1 submitted 30 September, 2024;
originally announced September 2024.
-
A tutorial on automatic differentiation with complex numbers
Authors:
Nicholas Krämer
Abstract:
Automatic differentiation is everywhere, but there exists only minimal documentation of how it works in complex arithmetic beyond stating "derivatives in $\mathbb{C}^d$" $\cong$ "derivatives in $\mathbb{R}^{2d}$" and, at best, shallow references to Wirtinger calculus. Unfortunately, the equivalence $\mathbb{C}^d \cong \mathbb{R}^{2d}$ becomes insufficient as soon as we need to derive custom gradie…
▽ More
Automatic differentiation is everywhere, but there exists only minimal documentation of how it works in complex arithmetic beyond stating "derivatives in $\mathbb{C}^d$" $\cong$ "derivatives in $\mathbb{R}^{2d}$" and, at best, shallow references to Wirtinger calculus. Unfortunately, the equivalence $\mathbb{C}^d \cong \mathbb{R}^{2d}$ becomes insufficient as soon as we need to derive custom gradient rules, e.g., to avoid differentiating "through" expensive linear algebra functions or differential equation simulators. To combat such a lack of documentation, this article surveys forward- and reverse-mode automatic differentiation with complex numbers, covering topics such as Wirtinger derivatives, a modified chain rule, and different gradient conventions while explicitly avoiding holomorphicity and the Cauchy--Riemann equations (which would be far too restrictive). To be precise, we will derive, explain, and implement a complex version of Jacobian-vector and vector-Jacobian products almost entirely with linear algebra without relying on complex analysis or differential geometry. This tutorial is a call to action, for users and developers alike, to take complex values seriously when implementing custom gradient propagation rules -- the manuscript explains how.
△ Less
Submitted 10 December, 2024; v1 submitted 10 September, 2024;
originally announced September 2024.
-
Gradients of Functions of Large Matrices
Authors:
Nicholas Krämer,
Pablo Moreno-Muñoz,
Hrittik Roy,
Søren Hauberg
Abstract:
Tuning scientific and probabilistic machine learning models $-$ for example, partial differential equations, Gaussian processes, or Bayesian neural networks $-$ often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the prese…
▽ More
Tuning scientific and probabilistic machine learning models $-$ for example, partial differential equations, Gaussian processes, or Bayesian neural networks $-$ often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the present work is the first to explain how to differentiate these workhorses of numerical linear algebra efficiently. To get there, we derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax when it comes to differentiating PDEs, GPyTorch for selecting Gaussian process models and beats standard factorisation methods for calibrating Bayesian neural networks. All this is achieved without any problem-specific code optimisation. Find the code at https://github.com/pnkraemer/experiments-lanczos-adjoints and install the library with pip install matfree.
△ Less
Submitted 24 October, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Continuity of Filters for Discrete-Time Control Problems Defined by Explicit Equations
Authors:
Eugene A. Feinberg,
Sayaka Ishizawa,
Pavlo O. Kasyanov,
David N. Kraemer
Abstract:
Discrete time control systems whose dynamics and observations are described by stochastic equations are common in engineering, operations research, health care, and economics. For example, stochastic filtering problems are usually defined via stochastic equations. These problems can be reduced to Markov decision processes (MDPs) whose states are posterior state distributions, and transition probab…
▽ More
Discrete time control systems whose dynamics and observations are described by stochastic equations are common in engineering, operations research, health care, and economics. For example, stochastic filtering problems are usually defined via stochastic equations. These problems can be reduced to Markov decision processes (MDPs) whose states are posterior state distributions, and transition probabilities for such MDPs are sometimes called filters. This paper investigates sufficient conditions on transition and observation functions for the original problems to guarantee weak continuity of the filter. Under mild conditions on cost functions, weak continuity implies the existence of optimal policies minimizing the expected total costs, the validity of optimality equations, and convergence of value iterations to optimal values. This paper uses recent results on weak continuity of filters for partially observable MDPs defined by transition and observation probabilities. It develops a criterion of weak continuity of transition probabilities and a sufficient condition for continuity in total variation of transition probabilities. The results are illustrated with applications to filtering problems.
△ Less
Submitted 3 February, 2025; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Approximate Bayesian Neural Operators: Uncertainty Quantification for Parametric PDEs
Authors:
Emilia Magnani,
Nicholas Krämer,
Runa Eschenhagen,
Lorenzo Rosasco,
Philipp Hennig
Abstract:
Neural operators are a type of deep architecture that learns to solve (i.e. learns the nonlinear solution operator of) partial differential equations (PDEs). The current state of the art for these models does not provide explicit uncertainty quantification. This is arguably even more of a problem for this kind of tasks than elsewhere in machine learning, because the dynamical systems typically des…
▽ More
Neural operators are a type of deep architecture that learns to solve (i.e. learns the nonlinear solution operator of) partial differential equations (PDEs). The current state of the art for these models does not provide explicit uncertainty quantification. This is arguably even more of a problem for this kind of tasks than elsewhere in machine learning, because the dynamical systems typically described by PDEs often exhibit subtle, multiscale structure that makes errors hard to spot by humans. In this work, we first provide a mathematically detailed Bayesian formulation of the ''shallow'' (linear) version of neural operators in the formalism of Gaussian processes. We then extend this analytic treatment to general deep neural operators using approximate methods from Bayesian deep learning. We extend previous results on neural operators by providing them with uncertainty quantification. As a result, our approach is able to identify cases, and provide structured uncertainty estimates, where the neural operator fails to predict well.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Continuity of Discounted Values and the Structure of Optimal Policies for Periodic-Review Inventory Control with Setup Costs
Authors:
Eugene A. Feinberg,
David N. Kraemer
Abstract:
This paper proves continuity of value functions in discounted periodic-review single-commodity total-cost inventory control problems with \revision{continuous inventory levels,} fixed ordering costs, possibly bounded inventory storage capacity, and possibly bounded order sizes for finite and infinite horizons. In each of these constrained models, the finite and infinite-horizon value functions are…
▽ More
This paper proves continuity of value functions in discounted periodic-review single-commodity total-cost inventory control problems with \revision{continuous inventory levels,} fixed ordering costs, possibly bounded inventory storage capacity, and possibly bounded order sizes for finite and infinite horizons. In each of these constrained models, the finite and infinite-horizon value functions are continuous, there exist deterministic Markov optimal finite-horizon policies, and there exist stationary deterministic Markov optimal infinite-horizon policies. For models with bounded inventory storage and unbounded order sizes, this paper also characterizes the conditions under which $(s_t, S_t)$ policies are optimal in the finite horizon and an $(s,S)$ policy is optimal in the infinite horizon.
△ Less
Submitted 26 July, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
ProbNum: Probabilistic Numerics in Python
Authors:
Jonathan Wenger,
Nicholas Krämer,
Marvin Pförtner,
Jonathan Schmidt,
Nathanael Bosch,
Nina Effenberger,
Johannes Zenn,
Alexandra Gessner,
Toni Karvonen,
François-Xavier Briol,
Maren Mahsereci,
Philipp Hennig
Abstract:
Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference. They have been developed for linear algebra, optimization, integration and differential equation simulation. PNMs naturally incorporate prior information about a problem and quantify uncertainty due to finite computational resources as well as stochastic input. In this paper, we present ProbNum: a Python l…
▽ More
Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference. They have been developed for linear algebra, optimization, integration and differential equation simulation. PNMs naturally incorporate prior information about a problem and quantify uncertainty due to finite computational resources as well as stochastic input. In this paper, we present ProbNum: a Python library providing state-of-the-art probabilistic numerical solvers. ProbNum enables custom composition of PNMs for specific problem classes via a modular design as well as wrappers for off-the-shelf use. Tutorials, documentation, developer guides and benchmarks are available online at www.probnum.org.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations
Authors:
Nicholas Krämer,
Jonathan Schmidt,
Philipp Hennig
Abstract:
This work develops a class of probabilistic algorithms for the numerical solution of nonlinear, time-dependent partial differential equations (PDEs). Current state-of-the-art PDE solvers treat the space- and time-dimensions separately, serially, and with black-box algorithms, which obscures the interactions between spatial and temporal approximation errors and misguides the quantification of the o…
▽ More
This work develops a class of probabilistic algorithms for the numerical solution of nonlinear, time-dependent partial differential equations (PDEs). Current state-of-the-art PDE solvers treat the space- and time-dimensions separately, serially, and with black-box algorithms, which obscures the interactions between spatial and temporal approximation errors and misguides the quantification of the overall error. To fix this issue, we introduce a probabilistic version of a technique called method of lines. The proposed algorithm begins with a Gaussian process interpretation of finite difference methods, which then interacts naturally with filtering-based probabilistic ordinary differential equation (ODE) solvers because they share a common language: Bayesian inference. Joint quantification of space- and time-uncertainty becomes possible without losing the performance benefits of well-tuned ODE solvers. Thereby, we extend the toolbox of probabilistic programs for differential equation simulation to PDEs.
△ Less
Submitted 9 March, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Probabilistic ODE Solutions in Millions of Dimensions
Authors:
Nicholas Krämer,
Nathanael Bosch,
Jonathan Schmidt,
Philipp Hennig
Abstract:
Probabilistic solvers for ordinary differential equations (ODEs) have emerged as an efficient framework for uncertainty quantification and inference on dynamical systems. In this work, we explain the mathematical assumptions and detailed implementation schemes behind solving {high-dimensional} ODEs with a probabilistic numerical algorithm. This has not been possible before due to matrix-matrix ope…
▽ More
Probabilistic solvers for ordinary differential equations (ODEs) have emerged as an efficient framework for uncertainty quantification and inference on dynamical systems. In this work, we explain the mathematical assumptions and detailed implementation schemes behind solving {high-dimensional} ODEs with a probabilistic numerical algorithm. This has not been possible before due to matrix-matrix operations in each solver step, but is crucial for scientifically relevant problems -- most importantly, the solution of discretised {partial} differential equations. In a nutshell, efficient high-dimensional probabilistic ODE solutions build either on independence assumptions or on Kronecker structure in the prior model. We evaluate the resulting efficiency on a range of problems, including the probabilistic numerical simulation of a differential equation with millions of dimensions.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Continuity of Parametric Optima for Possibly Discontinuous Functions and Noncompact Decision Sets
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
David N. Kraemer
Abstract:
This paper investigates continuity properties of value functions and solutions for parametric optimization problems. These problems are important in operations research, control, and economics because optimality equations are their particular cases. The classic fact, Berge's maximum theorem, gives sufficient conditions for continuity of value functions and upper semicontinuity of solution multifun…
▽ More
This paper investigates continuity properties of value functions and solutions for parametric optimization problems. These problems are important in operations research, control, and economics because optimality equations are their particular cases. The classic fact, Berge's maximum theorem, gives sufficient conditions for continuity of value functions and upper semicontinuity of solution multifunctions. Berge's maximum theorem assumes that the objective function is continuous and the multifunction of feasible sets is compact-valued. These assumptions are not satisfied in many applied problems, which historically has limited the relevance of the theorem. This paper generalizes Berge's maximum theorem in three directions: (i) the objective function may not be continuous, (ii) the multifunction of feasible sets may not be compact-valued, and (iii) necessary and sufficient conditions are provided. To illustrate the main theorem, this paper provides applications to inventory control and to the analysis of robust optimization over possibly noncompact action sets and discontinuous objective functions.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Linear-Time Probabilistic Solutions of Boundary Value Problems
Authors:
Nicholas Krämer,
Philipp Hennig
Abstract:
We propose a fast algorithm for the probabilistic solution of boundary value problems (BVPs), which are ordinary differential equations subject to boundary conditions. In contrast to previous work, we introduce a Gauss--Markov prior and tailor it specifically to BVPs, which allows computing a posterior distribution over the solution in linear time, at a quality and cost comparable to that of well-…
▽ More
We propose a fast algorithm for the probabilistic solution of boundary value problems (BVPs), which are ordinary differential equations subject to boundary conditions. In contrast to previous work, we introduce a Gauss--Markov prior and tailor it specifically to BVPs, which allows computing a posterior distribution over the solution in linear time, at a quality and cost comparable to that of well-established, non-probabilistic methods. Our model further delivers uncertainty quantification, mesh refinement, and hyperparameter adaptation. We demonstrate how these practical considerations positively impact the efficiency of the scheme. Altogether, this results in a practically usable probabilistic BVP solver that is (in contrast to non-probabilistic algorithms) natively compatible with other parts of the statistical modelling tool-chain.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Stable Implementation of Probabilistic ODE Solvers
Authors:
Nicholas Krämer,
Philipp Hennig
Abstract:
Probabilistic solvers for ordinary differential equations (ODEs) provide efficient quantification of numerical uncertainty associated with simulation of dynamical systems. Their convergence rates have been established by a growing body of theoretical analysis. However, these algorithms suffer from numerical instability when run at high order or with small step-sizes -- that is, exactly in the regi…
▽ More
Probabilistic solvers for ordinary differential equations (ODEs) provide efficient quantification of numerical uncertainty associated with simulation of dynamical systems. Their convergence rates have been established by a growing body of theoretical analysis. However, these algorithms suffer from numerical instability when run at high order or with small step-sizes -- that is, exactly in the regime in which they achieve the highest accuracy. The present work proposes and examines a solution to this problem. It involves three components: accurate initialisation, a coordinate change preconditioner that makes numerical stability concerns step-size-independent, and square-root implementation. Using all three techniques enables numerical computation of probabilistic solutions of ODEs with algorithms of order up to 11, as demonstrated on a set of challenging test problems. The resulting rapid convergence is shown to be competitive to high-order, state-of-the-art, classical methods. As a consequence, a barrier between analysing probabilistic ODE solvers and applying them to interesting machine learning problems is effectively removed.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
Differentiable Likelihoods for Fast Inversion of 'Likelihood-Free' Dynamical Systems
Authors:
Hans Kersting,
Nicholas Krämer,
Martin Schiegg,
Christian Daniel,
Michael Tiemann,
Philipp Hennig
Abstract:
Likelihood-free (a.k.a. simulation-based) inference problems are inverse problems with expensive, or intractable, forward models. ODE inverse problems are commonly treated as likelihood-free, as their forward map has to be numerically approximated by an ODE solver. This, however, is not a fundamental constraint but just a lack of functionality in classic ODE solvers, which do not return a likeliho…
▽ More
Likelihood-free (a.k.a. simulation-based) inference problems are inverse problems with expensive, or intractable, forward models. ODE inverse problems are commonly treated as likelihood-free, as their forward map has to be numerically approximated by an ODE solver. This, however, is not a fundamental constraint but just a lack of functionality in classic ODE solvers, which do not return a likelihood but a point estimate. To address this shortcoming, we employ Gaussian ODE filtering (a probabilistic numerical method for ODEs) to construct a local Gaussian approximation to the likelihood. This approximation yields tractable estimators for the gradient and Hessian of the (log-)likelihood. Insertion of these estimators into existing gradient-based optimization and sampling methods engenders new solvers for ODE inverse problems. We demonstrate that these methods outperform standard likelihood-free approaches on three benchmark-systems.
△ Less
Submitted 29 June, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
Convergence rates of Kernel Conjugate Gradient for random design regression
Authors:
Gilles Blanchard,
Nicole Krämer
Abstract:
We prove statistical rates of convergence for kernel-based least squares regression from i.i.d. data using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. Following the setting introduced…
▽ More
We prove statistical rates of convergence for kernel-based least squares regression from i.i.d. data using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. Following the setting introduced in earlier related literature, we study so-called "fast convergence rates" depending on the regularity of the target regression function (measured by a source condition in terms of the kernel integral operator) and on the effective dimensionality of the data mapped into the kernel space. We obtain upper bounds, essentially matching known minimax lower bounds, for the $\mathcal{L}^2$ (prediction) norm as well as for the stronger Hilbert norm, if the true regression function belongs to the reproducing kernel Hilbert space. If the latter assumption is not fulfilled, we obtain similar convergence rates for appropriate norms, provided additional unlabeled data are available.
△ Less
Submitted 8 July, 2016;
originally announced July 2016.
-
Total loss estimation using copula-based regression models
Authors:
Nicole Kraemer,
Eike C. Brechmann,
Daniel Silvestrini,
Claudia Czado
Abstract:
We present a joint copula-based model for insurance claims and sizes. It uses bivariate copulae to accommodate for the dependence between these quantities. We derive the general distribution of the policy loss without the restrictive assumption of independence. We illustrate that this distribution tends to be skewed and multi-modal, and that an independence assumption can lead to substantial bias…
▽ More
We present a joint copula-based model for insurance claims and sizes. It uses bivariate copulae to accommodate for the dependence between these quantities. We derive the general distribution of the policy loss without the restrictive assumption of independence. We illustrate that this distribution tends to be skewed and multi-modal, and that an independence assumption can lead to substantial bias in the estimation of the policy loss. Further, we extend our framework to regression models by combining marginal generalized linear models with a copula. We show that this approach leads to a flexible class of models, and that the parameters can be estimated efficiently using maximum-likelihood. We propose a test procedure for the selection of the optimal copula family. The usefulness of our approach is illustrated in a simulation study and in an analysis of car insurance policies.
△ Less
Submitted 24 September, 2012;
originally announced September 2012.
-
Imaginärquadratische Einbettung von Ordnungen rationaler Quaternionenalgebren, und die nichtzyklischen endlichen Untergruppen der Bianchi-Gruppen
Authors:
Norbert Krämer
Abstract:
Let k be an imaginary quadratic number field, let F be a rational quaternion algebra and M an extension of F as a quaternion k-algebra. We are going to classify the F-orders which arise as an intersection of F with a maximal M-order; and we are going to prove that the discriminant of such an intersection determines uniquely the isomorphism type of the corresponding maximal M-order. Building on thi…
▽ More
Let k be an imaginary quadratic number field, let F be a rational quaternion algebra and M an extension of F as a quaternion k-algebra. We are going to classify the F-orders which arise as an intersection of F with a maximal M-order; and we are going to prove that the discriminant of such an intersection determines uniquely the isomorphism type of the corresponding maximal M-order. Building on this, we are going to relate this intersection to the intersection of a second rational quaternion algebra F' in M with a second maximal M-order. This allows us to determine whether the Bianchi group over the maximal k-order contains 3-dihedral, tetrahedral or 2-dihedral groups which are maximal as a finite subgroup. Additionally, we determine the number of maximal M-orders which respectively admit the same intersection with F. Building on this, we calculate the numbers of conjugacy classes of non-cyclic maximal finite subgroups in the given Bianchi group. In the final two chapters, we investigate non-trivial intersections of non-cyclic finite subgroups of the Bianchi groups and extend our results to Eichler orders and especially to Bianchi congruence subgroups.
△ Less
Submitted 20 February, 2017; v1 submitted 27 July, 2012;
originally announced July 2012.
-
Optimal learning rates for Kernel Conjugate Gradient regression
Authors:
Gilles Blanchard,
Nicole Kraemer
Abstract:
We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key…
▽ More
We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the intrinsic dimensionality of the data mapped into the kernel space. Lower bounds on attainable rates depending on these two quantities were established in earlier literature, and we obtain upper bounds for the considered method that match these lower bounds (up to a log factor) if the true regression function belongs to the reproducing kernel Hilbert space. If this assumption is not fulfilled, we obtain similar convergence rates provided additional unlabeled data are available. The order of the learning rates match state-of-the-art results that were recently obtained for least squares support vector machines and for linear regularization operators.
△ Less
Submitted 29 September, 2010;
originally announced September 2010.
-
The Degrees of Freedom of Partial Least Squares Regression
Authors:
Nicole Kraemer,
Masashi Sugiyama
Abstract:
The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic…
▽ More
The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how the Degrees of Freedom approach can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.
△ Less
Submitted 9 February, 2011; v1 submitted 22 February, 2010;
originally announced February 2010.
-
Kernel Partial Least Squares is Universally Consistent
Authors:
Gilles Blanchard,
Nicole Kraemer
Abstract:
We prove the statistical consistency of kernel Partial Least Squares Regression applied to a bounded regression learning problem on a reproducing kernel Hilbert space. Partial Least Squares stands out of well-known classical approaches as e.g. Ridge Regression or Principal Components Regression, as it is not defined as the solution of a global cost minimization procedure over a fixed model nor i…
▽ More
We prove the statistical consistency of kernel Partial Least Squares Regression applied to a bounded regression learning problem on a reproducing kernel Hilbert space. Partial Least Squares stands out of well-known classical approaches as e.g. Ridge Regression or Principal Components Regression, as it is not defined as the solution of a global cost minimization procedure over a fixed model nor is it a linear estimator. Instead, approximate solutions are constructed by projections onto a nested set of data-dependent subspaces. To prove consistency, we exploit the known fact that Partial Least Squares is equivalent to the conjugate gradient algorithm in combination with early stopping. The choice of the stopping rule (number of iterations) is a crucial point. We study two empirical stopping rules. The first one monitors the estimation error in each iteration step of Partial Least Squares, and the second one estimates the empirical complexity in terms of a condition number. Both stopping rules lead to universally consistent estimators provided the kernel is universal.
△ Less
Submitted 14 January, 2010; v1 submitted 25 February, 2009;
originally announced February 2009.
-
Penalized Partial Least Squares Based on B-Splines Transformations
Authors:
Nicole Kraemer,
Anne-Laure Boulesteix,
Gerhard Tutz
Abstract:
We propose a novel method to model nonlinear regression problems by adapting the principle of penalization to Partial Least Squares (PLS). Starting with a generalized additive model, we expand the additive component of each variable in terms of a generous amount of B-Splines basis functions. In order to prevent overfitting and to obtain smooth functions, we estimate the regression model by apply…
▽ More
We propose a novel method to model nonlinear regression problems by adapting the principle of penalization to Partial Least Squares (PLS). Starting with a generalized additive model, we expand the additive component of each variable in terms of a generous amount of B-Splines basis functions. In order to prevent overfitting and to obtain smooth functions, we estimate the regression model by applying a penalized version of PLS. Although our motivation for penalized PLS stems from its use for B-Splines transformed data, the proposed approach is very general and can be applied to other penalty terms or to other dimension reduction techniques. It turns out that penalized PLS can be computed virtually as fast as PLS. We prove a close connection of penalized PLS to the solutions of preconditioned linear systems. In the case of high-dimensional data, the new method is shown to be an attractive competitor to other techniques for estimating generalized additive models. If the number of predictor variables is high compared to the number of examples, traditional techniques often suffer from overfitting. We illustrate that penalized PLS performs well in these situations.
△ Less
Submitted 23 August, 2006;
originally announced August 2006.
-
Boosting for Functional Data
Authors:
Nicole Kraemer
Abstract:
We deal with the task of supervised learning if the data is of functional type. The crucial point is the choice of the appropriate fitting method (learner). Boosting is a stepwise technique that combines learners in such a way that the composite learner outperforms the single learner. This can be done by either reweighting the examples or with the help of a gradient descent technique. In this pa…
▽ More
We deal with the task of supervised learning if the data is of functional type. The crucial point is the choice of the appropriate fitting method (learner). Boosting is a stepwise technique that combines learners in such a way that the composite learner outperforms the single learner. This can be done by either reweighting the examples or with the help of a gradient descent technique. In this paper, we explain how to extend Boosting methods to problems that involve functional data.
△ Less
Submitted 30 May, 2006;
originally announced May 2006.
-
On the shrinkage behavior of partial least squares regression
Authors:
Nicole Kraemer
Abstract:
We present a formula for the shrinkage factors of the Partial Least Squares regression estimator and deduce some of their properties, in particular the known fact that some of the factors are >1. We investigate the effect of shrinkage factors for the Mean Squared error of linear estimators and illustrate that we cannot extend the results to nonlinear estimators. In particular, shrinkage factors…
▽ More
We present a formula for the shrinkage factors of the Partial Least Squares regression estimator and deduce some of their properties, in particular the known fact that some of the factors are >1. We investigate the effect of shrinkage factors for the Mean Squared error of linear estimators and illustrate that we cannot extend the results to nonlinear estimators. In particular, shrinkage factors >1 do not automatically lead to a poorer Mean Squared Error. We investigate empirically the effect of bounding the the absolute value of the Partial Least Squares shrinkage factors by 1.
△ Less
Submitted 23 March, 2005;
originally announced March 2005.
-
Local models for ramified unitary groups
Authors:
Nicole Kraemer
Abstract:
In this article, we study local models associated to certain Shimura varieties. In particular, we present a resoultion of their singularities. As a consequence, we are able to determine the alternating semisimple trace of the geometric Frobenius on the sheaf of nearby cycles.
In this article, we study local models associated to certain Shimura varieties. In particular, we present a resoultion of their singularities. As a consequence, we are able to determine the alternating semisimple trace of the geometric Frobenius on the sheaf of nearby cycles.
△ Less
Submitted 3 February, 2003;
originally announced February 2003.