-
Feature space approximation for kernel-based supervised learning
Authors:
Patrick Gelß,
Stefan Klus,
Ingmar Schuster,
Christof Schütte
Abstract:
We propose a method for the approximation of high- or even infinite-dimensional feature vectors, which play an important role in supervised learning. The goal is to reduce the size of the training data, resulting in lower storage consumption and computational complexity. Furthermore, the method can be regarded as a regularization technique, which improves the generalizability of learned target fun…
▽ More
We propose a method for the approximation of high- or even infinite-dimensional feature vectors, which play an important role in supervised learning. The goal is to reduce the size of the training data, resulting in lower storage consumption and computational complexity. Furthermore, the method can be regarded as a regularization technique, which improves the generalizability of learned target functions. We demonstrate significant improvements in comparison to the computation of data-driven predictions involving the full training data set. The method is applied to classification and regression problems from different application areas such as image recognition, system identification, and oceanographic time series analysis.
△ Less
Submitted 15 March, 2021; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
Authors:
Kashif Rasul,
Abdul-Saboor Sheikh,
Ingmar Schuster,
Urs Bergmann,
Roland Vollgraf
Abstract:
Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited fo…
▽ More
Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited for this problem, but multivariate models often assume a simple parametric distribution and do not scale to high dimensions. In this work we model the multivariate temporal dynamics of time series via an autoregressive deep learning model, where the data distribution is represented by a conditioned normalizing flow. This combination retains the power of autoregressive models, such as good performance in extrapolation into the future, with the flexibility of flows as a general purpose high-dimensional distribution model, while remaining computationally tractable. We show that it improves over the state-of-the-art for standard metrics on many real-world data sets with several thousand interacting time-series.
△ Less
Submitted 14 January, 2021; v1 submitted 14 February, 2020;
originally announced February 2020.
-
A Rigorous Theory of Conditional Mean Embeddings
Authors:
Ilja Klebanov,
Ingmar Schuster,
T. J. Sullivan
Abstract:
Conditional mean embeddings (CMEs) have proven themselves to be a powerful tool in many machine learning applications. They allow the efficient conditioning of probability distributions within the corresponding reproducing kernel Hilbert spaces (RKHSs) by providing a linear-algebraic relation for the kernel mean embeddings of the respective joint and conditional probability distributions. Both cen…
▽ More
Conditional mean embeddings (CMEs) have proven themselves to be a powerful tool in many machine learning applications. They allow the efficient conditioning of probability distributions within the corresponding reproducing kernel Hilbert spaces (RKHSs) by providing a linear-algebraic relation for the kernel mean embeddings of the respective joint and conditional probability distributions. Both centred and uncentred covariance operators have been used to define CMEs in the existing literature. In this paper, we develop a mathematically rigorous theory for both variants, discuss the merits and problems of each, and significantly weaken the conditions for applicability of CMEs. In the course of this, we demonstrate a beautiful connection to Gaussian conditioning in Hilbert spaces.
△ Less
Submitted 14 April, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Set Flow: A Permutation Invariant Normalizing Flow
Authors:
Kashif Rasul,
Ingmar Schuster,
Roland Vollgraf,
Urs Bergmann
Abstract:
We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies…
▽ More
We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies between entities of the set and is able to train and sample with variable set sizes in a computationally efficient manner. Experiments on 3D point clouds show state-of-the art likelihoods.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Kernel Conditional Density Operators
Authors:
Ingmar Schuster,
Mattes Mollenhauer,
Stefan Klus,
Krikamol Muandet
Abstract:
We introduce a novel conditional density estimation model termed the conditional density operator (CDO). It naturally captures multivariate, multimodal output densities and shows performance that is competitive with recent neural conditional density models and Gaussian processes. The proposed model is based on a novel approach to the reconstruction of probability densities from their kernel mean e…
▽ More
We introduce a novel conditional density estimation model termed the conditional density operator (CDO). It naturally captures multivariate, multimodal output densities and shows performance that is competitive with recent neural conditional density models and Gaussian processes. The proposed model is based on a novel approach to the reconstruction of probability densities from their kernel mean embeddings by drawing connections to estimation of Radon-Nikodym derivatives in the reproducing kernel Hilbert space (RKHS). We prove finite sample bounds for the estimation error in a standard density reconstruction scenario, independent of problem dimensionality. Interestingly, when a kernel is used that is also a probability density, the CDO allows us to both evaluate and sample the output density efficiently. We demonstrate the versatility and performance of the proposed model on both synthetic and real-world data.
△ Less
Submitted 29 October, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
A kernel-based approach to molecular conformation analysis
Authors:
Stefan Klus,
Andreas Bittracher,
Ingmar Schuster,
Christof Schütte
Abstract:
We present a novel machine learning approach to understanding conformation dynamics of biomolecules. The approach combines kernel-based techniques that are popular in the machine learning community with transfer operator theory for analyzing dynamical systems in order to identify conformation dynamics based on molecular dynamics simulation data. We show that many of the prominent methods like Mark…
▽ More
We present a novel machine learning approach to understanding conformation dynamics of biomolecules. The approach combines kernel-based techniques that are popular in the machine learning community with transfer operator theory for analyzing dynamical systems in order to identify conformation dynamics based on molecular dynamics simulation data. We show that many of the prominent methods like Markov State Models, EDMD, and TICA can be regarded as special cases of this approach and that new efficient algorithms can be constructed based on this derivation. The results of these new powerful methods will be illustrated with several examples, in particular the alanine dipeptide and the protein NTL9.
△ Less
Submitted 4 December, 2018; v1 submitted 28 September, 2018;
originally announced September 2018.
-
Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces
Authors:
Mattes Mollenhauer,
Ingmar Schuster,
Stefan Klus,
Christof Schütte
Abstract:
Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models…
▽ More
Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models. It was recently shown that transfer operators such as the Perron-Frobenius or Koopman operator can also be approximated in a similar fashion using covariance and cross-covariance operators and that eigenfunctions of these operators can be obtained by solving associated matrix eigenvalue problems. The goal of this paper is to provide a solid functional analytic foundation for the eigenvalue decomposition of RKHS operators and to extend the approach to the singular value decomposition. The results are illustrated with simple guiding examples.
△ Less
Submitted 16 March, 2020; v1 submitted 24 July, 2018;
originally announced July 2018.
-
Analyzing high-dimensional time-series data using kernel transfer operator eigenfunctions
Authors:
Stefan Klus,
Sebastian Peitz,
Ingmar Schuster
Abstract:
Kernel transfer operators, which can be regarded as approximations of transfer operators such as the Perron-Frobenius or Koopman operator in reproducing kernel Hilbert spaces, are defined in terms of covariance and cross-covariance operators and have been shown to be closely related to the conditional mean embedding framework developed by the machine learning community. The goal of this paper is t…
▽ More
Kernel transfer operators, which can be regarded as approximations of transfer operators such as the Perron-Frobenius or Koopman operator in reproducing kernel Hilbert spaces, are defined in terms of covariance and cross-covariance operators and have been shown to be closely related to the conditional mean embedding framework developed by the machine learning community. The goal of this paper is to show how the dominant eigenfunctions of these operators in combination with gradient-based optimization techniques can be used to detect long-lived coherent patterns in high-dimensional time-series data. The results will be illustrated using video data and a fluid flow example.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Markov Chain Importance Sampling -- a highly efficient estimator for MCMC
Authors:
Ingmar Schuster,
Ilja Klebanov
Abstract:
Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm, it pr…
▽ More
Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm, it provides a novel way of correcting the discretization error. Our estimator satisfies a central limit theorem and improves on error per CPU cycle, often to a large extent. As a by-product it enables estimating the normalizing constant, an important quantity in Bayesian machine learning and statistics.
△ Less
Submitted 6 August, 2020; v1 submitted 18 May, 2018;
originally announced May 2018.
-
Exact active subspace Metropolis-Hastings, with applications to the Lorenz-96 system
Authors:
Ingmar Schuster,
Paul G. Constantine,
T. J. Sullivan
Abstract:
We consider the application of active subspaces to inform a Metropolis-Hastings algorithm, thereby aggressively reducing the computational dimension of the sampling problem. We show that the original formulation, as proposed by Constantine, Kent, and Bui-Thanh (SIAM J. Sci. Comput., 38(5):A2779-A2805, 2016), possesses asymptotic bias. Using pseudo-marginal arguments, we develop an asymptotically u…
▽ More
We consider the application of active subspaces to inform a Metropolis-Hastings algorithm, thereby aggressively reducing the computational dimension of the sampling problem. We show that the original formulation, as proposed by Constantine, Kent, and Bui-Thanh (SIAM J. Sci. Comput., 38(5):A2779-A2805, 2016), possesses asymptotic bias. Using pseudo-marginal arguments, we develop an asymptotically unbiased variant. Our algorithm is applied to a synthetic multimodal target distribution as well as a Bayesian formulation of a parameter inference problem for a Lorenz-96 system.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.
-
Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces
Authors:
Stefan Klus,
Ingmar Schuster,
Krikamol Muandet
Abstract:
Transfer operators such as the Perron--Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show th…
▽ More
Transfer operators such as the Perron--Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show that these operators are related to Hilbert space representations of conditional distributions, known as conditional mean embeddings in the machine learning community. Moreover, numerical methods to compute empirical estimates of these embeddings are akin to data-driven methods for the approximation of transfer operators such as extended dynamic mode decomposition and its variants. One main benefit of the presented kernel-based approaches is that these methods can be applied to any domain where a similarity measure given by a kernel is available. We illustrate the results with the aid of guiding examples and highlight potential applications in molecular dynamics as well as video and text data analysis.
△ Less
Submitted 13 August, 2019; v1 submitted 5 December, 2017;
originally announced December 2017.
-
Kernel Sequential Monte Carlo
Authors:
Ingmar Schuster,
Heiko Strathmann,
Brooks Paige,
Dino Sejdinovic
Abstract:
We propose kernel sequential Monte Carlo (KSMC), a framework for sampling from static target densities. KSMC is a family of sequential Monte Carlo algorithms that are based on building emulator models of the current particle system in a reproducing kernel Hilbert space. We here focus on modelling nonlinear covariance structure and gradients of the target. The emulator's geometry is adaptively upda…
▽ More
We propose kernel sequential Monte Carlo (KSMC), a framework for sampling from static target densities. KSMC is a family of sequential Monte Carlo algorithms that are based on building emulator models of the current particle system in a reproducing kernel Hilbert space. We here focus on modelling nonlinear covariance structure and gradients of the target. The emulator's geometry is adaptively updated and subsequently used to inform local proposals. Unlike in adaptive Markov chain Monte Carlo, continuous adaptation does not compromise convergence of the sampler. KSMC combines the strengths of sequental Monte Carlo and kernel methods: superior performance for multimodal targets and the ability to estimate model evidence as compared to Markov chain Monte Carlo, and the emulator's ability to represent targets that exhibit high degrees of nonlinearity. As KSMC does not require access to target gradients, it is particularly applicable on targets whose gradients are unknown or prohibitively expensive. We describe necessary tuning details and demonstrate the benefits of the the proposed methodology on a series of challenging synthetic and real-world examples.
△ Less
Submitted 25 July, 2017; v1 submitted 11 October, 2015;
originally announced October 2015.
-
Gradient Importance Sampling
Authors:
Ingmar Schuster
Abstract:
Adaptive Monte Carlo schemes developed over the last years usually seek to ensure ergodicity of the sampling process in line with MCMC tradition. This poses constraints on what is possible in terms of adaptation. In the general case ergodicity can only be guaranteed if adaptation is diminished at a certain rate. Importance Sampling approaches offer a way to circumvent this limitation and design sa…
▽ More
Adaptive Monte Carlo schemes developed over the last years usually seek to ensure ergodicity of the sampling process in line with MCMC tradition. This poses constraints on what is possible in terms of adaptation. In the general case ergodicity can only be guaranteed if adaptation is diminished at a certain rate. Importance Sampling approaches offer a way to circumvent this limitation and design sampling algorithms that keep adapting. Here I present a gradient informed variant of SMC (and its special case Population Monte Carlo) for static problems.
△ Less
Submitted 21 July, 2015;
originally announced July 2015.
-
Consistency of Importance Sampling estimates based on dependent sample sets and an application to models with factorizing likelihoods
Authors:
Ingmar Schuster
Abstract:
In this paper, I proof that Importance Sampling estimates based on dependent sample sets are consistent under certain conditions. This can be used to reduce variance in Bayesian Models with factorizing likelihoods, using sample sets that are much larger than the number of likelihood evaluations, a technique dubbed Sample Inflation. I evaluate Sample Inflation on a toy Gaussian problem and two Mixt…
▽ More
In this paper, I proof that Importance Sampling estimates based on dependent sample sets are consistent under certain conditions. This can be used to reduce variance in Bayesian Models with factorizing likelihoods, using sample sets that are much larger than the number of likelihood evaluations, a technique dubbed Sample Inflation. I evaluate Sample Inflation on a toy Gaussian problem and two Mixture Models.
△ Less
Submitted 1 March, 2015;
originally announced March 2015.
-
A Bayesian Model of node interaction in networks
Authors:
Ingmar Schuster
Abstract:
We are concerned with modeling the strength of links in networks by taking into account how often those links are used. Link usage is a strong indicator of how closely two nodes are related, but existing network models in Bayesian Statistics and Machine Learning are able to predict only wether a link exists at all. As priors for latent attributes of network nodes we explore the Chinese Restaurant…
▽ More
We are concerned with modeling the strength of links in networks by taking into account how often those links are used. Link usage is a strong indicator of how closely two nodes are related, but existing network models in Bayesian Statistics and Machine Learning are able to predict only wether a link exists at all. As priors for latent attributes of network nodes we explore the Chinese Restaurant Process (CRP) and a multivariate Gaussian with fixed dimensionality. The model is applied to a social network dataset and a word coocurrence dataset.
△ Less
Submitted 6 March, 2015; v1 submitted 18 February, 2014;
originally announced February 2014.