-
Learning with Importance Weighted Variational Inference: Asymptotics for Gradient Estimators of the VR-IWAE Bound
Authors:
Kamélia Daudel,
François Roueff
Abstract:
Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) and the Variational Rényi (VR) bounds. The methodology to learn the parameters of interest using these bounds typically amounts to running gradie…
▽ More
Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) and the Variational Rényi (VR) bounds. The methodology to learn the parameters of interest using these bounds typically amounts to running gradient-based variational inference algorithms that incorporate the reparameterization trick. However, the way the choice of the variational bound impacts the outcome of variational inference algorithms can be unclear. Recently, the VR-IWAE bound was introduced as a variational bound that unifies the ELBO, IWAE and VR bounds methodologies. In this paper, we provide two analyses for the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound, which reveal the advantages and limitations of these gradient estimators while enabling us to compare of the ELBO, IWAE and VR bounds methodologies. Our work advances the understanding of importance weighted variational inference methods and we illustrate our theoretical findings empirically.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Monotonic Alpha-divergence Minimisation for Variational Inference
Authors:
Kamélia Daudel,
Randal Douc,
François Roueff
Abstract:
In this paper, we introduce a novel family of iterative algorithms which carry out $α$-divergence minimisation in a Variational Inference context. They do so by ensuring a systematic decrease at each step in the $α$-divergence between the variational and the posterior distributions. In its most general form, the variational distribution is a mixture model and our framework allows us to simultaneou…
▽ More
In this paper, we introduce a novel family of iterative algorithms which carry out $α$-divergence minimisation in a Variational Inference context. They do so by ensuring a systematic decrease at each step in the $α$-divergence between the variational and the posterior distributions. In its most general form, the variational distribution is a mixture model and our framework allows us to simultaneously optimise the weights and components parameters of this mixture model. Our approach permits us to build on various methods previously proposed for $α$-divergence minimisation such as Gradient or Power Descent schemes and we also shed a new light on an integrated Expectation Maximization algorithm. Lastly, we provide empirical evidence that our methodology yields improved results on several multimodal target distributions and on a real data example.
△ Less
Submitted 10 April, 2023; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Nonlinear Functional Output Regression: a Dictionary Approach
Authors:
Dimitri Bouche,
Marianne Clausel,
François Roueff,
Florence d'Alché-Buc
Abstract:
To address functional-output regression, we introduce projection learning (PL), a novel dictionary-based approach that learns to predict a function that is expanded on a dictionary while minimizing an empirical risk based on a functional loss. PL makes it possible to use non orthogonal dictionaries and can then be combined with dictionary learning; it is thus much more flexible than expansion-base…
▽ More
To address functional-output regression, we introduce projection learning (PL), a novel dictionary-based approach that learns to predict a function that is expanded on a dictionary while minimizing an empirical risk based on a functional loss. PL makes it possible to use non orthogonal dictionaries and can then be combined with dictionary learning; it is thus much more flexible than expansion-based approaches relying on vectorial losses. This general method is instantiated with reproducing kernel Hilbert spaces of vector-valued functions as kernel-based projection learning (KPL). For the functional square loss, two closed-form estimators are proposed, one for fully observed output functions and the other for partially observed ones. Both are backed theoretically by an excess risk analysis. Then, in the more general setting of integral losses based on differentiable ground losses, KPL is implemented using first-order optimization for both fully and partially observed output functions. Eventually, several robustness aspects of the proposed algorithms are highlighted on a toy dataset; and a study on two real datasets shows that they are competitive compared to other nonlinear approaches. Notably, using the square loss and a learnt dictionary, KPL enjoys a particularily attractive trade-off between computational cost and performances.
△ Less
Submitted 26 February, 2021; v1 submitted 3 March, 2020;
originally announced March 2020.
-
New results on approximate Hilbert pairs of wavelet filters with common factors
Authors:
Sophie Achard,
Irène Gannaz,
Marianne Clausel,
François Roueff
Abstract:
In this paper, we consider the design of wavelet filters based on the Thiran common-factor approach proposed in Selesnick [2001]. This approach aims at building finite impulseresponse filters of a Hilbert-pair of wavelets serving as real and imaginary part of a complexwavelet. Unfortunately it is not possible to construct wavelets which are both finitelysupported and analytic. The wavelet filters…
▽ More
In this paper, we consider the design of wavelet filters based on the Thiran common-factor approach proposed in Selesnick [2001]. This approach aims at building finite impulseresponse filters of a Hilbert-pair of wavelets serving as real and imaginary part of a complexwavelet. Unfortunately it is not possible to construct wavelets which are both finitelysupported and analytic. The wavelet filters constructed using the common-factor approachare then approximately analytic. Thus, it is of interest to control their analyticity. Thepurpose of this paper is to first provide precise and explicit expressions as well as easilyexploitable bounds for quantifying the analytic approximation of this complex wavelet.Then, we prove the existence of such filters enjoying the classical perfect reconstructionconditions, with arbitrarily many vanishing moments.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
Anomaly Detection and Localisation using Mixed Graphical Models
Authors:
Romain Laby,
François Roueff,
Alexandre Gramfort
Abstract:
We propose a method that performs anomaly detection and localisation within heterogeneous data using a pairwise undirected mixed graphical model. The data are a mixture of categorical and quantitative variables, and the model is learned over a dataset that is supposed not to contain any anomaly. We then use the model over temporal data, potentially a data stream, using a version of the two-sided C…
▽ More
We propose a method that performs anomaly detection and localisation within heterogeneous data using a pairwise undirected mixed graphical model. The data are a mixture of categorical and quantitative variables, and the model is learned over a dataset that is supposed not to contain any anomaly. We then use the model over temporal data, potentially a data stream, using a version of the two-sided CUSUM algorithm. The proposed decision statistic is based on a conditional likelihood ratio computed for each variable given the others. Our results show that this function allows to detect anomalies variable by variable, and thus to localise the variables involved in the anomalies more precisely than univariate methods based on simple marginals.
△ Less
Submitted 20 July, 2016;
originally announced July 2016.
-
Nonparametric estimation of mark's distribution of an exponential Shot-noise process
Authors:
Paul Ilhe,
Eric Moulines,
François Roueff,
Antoine Souloumiac
Abstract:
In this paper, we consider a nonlinear inverse problem occurring in nuclear science. Gamma rays randomly hit a semiconductor detector which produces an impulse response of electric current. Because the sampling period of the measured current is larger than the mean inter arrival time of photons, the impulse responses associated to different gamma rays can overlap: this phenomenon is known as pileu…
▽ More
In this paper, we consider a nonlinear inverse problem occurring in nuclear science. Gamma rays randomly hit a semiconductor detector which produces an impulse response of electric current. Because the sampling period of the measured current is larger than the mean inter arrival time of photons, the impulse responses associated to different gamma rays can overlap: this phenomenon is known as pileup. In this work, it is assumed that the impulse response is an exponentially decaying function. We propose a novel method to infer the distribution of gamma photon energies from the indirect measurements obtained from the detector. This technique is based on a formula linking the characteristic function of the photon density to a function involving the characteristic function and its derivative of the observations. We establish that our estimator converges to the mark density in uniform norm at a logarithmic rate. A limited Monte-Carlo experiment is provided to support our findings.
△ Less
Submitted 26 January, 2016; v1 submitted 26 June, 2015;
originally announced June 2015.
-
Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes
Authors:
Christophe Giraud,
François Roueff,
Andres Sanchez-Perez
Abstract:
In this work, we study the problem of aggregating a finite number of predictors for nonstationary sub-linear processes. We provide oracle inequalities relying essentially on three ingredients: (1) a uniform bound of the $\ell^1$ norm of the time varying sub-linear coefficients, (2) a Lipschitz assumption on the predictors and (3) moment conditions on the noise appearing in the linear representatio…
▽ More
In this work, we study the problem of aggregating a finite number of predictors for nonstationary sub-linear processes. We provide oracle inequalities relying essentially on three ingredients: (1) a uniform bound of the $\ell^1$ norm of the time varying sub-linear coefficients, (2) a Lipschitz assumption on the predictors and (3) moment conditions on the noise appearing in the linear representation. Two kinds of aggregations are considered giving rise to different moment conditions on the noise and more or less sharp oracle inequalities. We apply this approach for deriving an adaptive predictor for locally stationary time varying autoregressive (TVAR) processes. It is obtained by aggregating a finite number of well chosen predictors, each of them enjoying an optimal minimax convergence rate under specific smoothness conditions on the TVAR coefficients. We show that the obtained aggregated predictor achieves a minimax rate while adapting to the unknown smoothness. To prove this result, a lower bound is established for the minimax rate of the prediction risk for the TVAR process. Numerical experiments complete this study. An important feature of this approach is that the aggregated predictor can be computed recursively and is thus applicable in an online prediction context.
△ Less
Submitted 17 November, 2015; v1 submitted 27 April, 2014;
originally announced April 2014.
-
Ergodicity and scaling limit of a constrained multivariate Hawkes process
Authors:
Ban Zheng,
François Roueff,
Frédéric Abergel
Abstract:
We introduce a multivariate Hawkes process with constraints on its conditional density. It is a multivariate point process with conditional intensity similar to that of a multivariate Hawkes process but certain events are forbidden with respect to boundary conditions on a multidimensional constraint variable, whose evolution is driven by the point process. We study this process in the special case…
▽ More
We introduce a multivariate Hawkes process with constraints on its conditional density. It is a multivariate point process with conditional intensity similar to that of a multivariate Hawkes process but certain events are forbidden with respect to boundary conditions on a multidimensional constraint variable, whose evolution is driven by the point process. We study this process in the special case where the fertility function is exponential so that the process is entirely described by an underlying Markov chain, which includes the constraint variable. Some conditions on the parameters are established to ensure the ergodicity of the chain. Moreover, scaling limits are derived for the integrated point process. This study is primarily motivated by the stochastic modelling of a limit order book for high frequency financial data analysis.
△ Less
Submitted 13 February, 2014; v1 submitted 18 January, 2013;
originally announced January 2013.
-
Testing for homogeneity of variance in the wavelet domain
Authors:
Olaf Kouamo,
Eric Moulines,
François Roueff
Abstract:
The danger of confusing long-range dependence with non-stationarity has been pointed out by many authors. Finding an answer to this difficult question is of importance to model time-series showing trend-like behavior, such as river run-off in hydrology, historical temperatures in the study of climates changes, or packet counts in network traffic engineering. The main goal of this paper is to devel…
▽ More
The danger of confusing long-range dependence with non-stationarity has been pointed out by many authors. Finding an answer to this difficult question is of importance to model time-series showing trend-like behavior, such as river run-off in hydrology, historical temperatures in the study of climates changes, or packet counts in network traffic engineering. The main goal of this paper is to develop a test procedure to detect the presence of non-stationarity for a class of processes whose $K$-th order difference is stationary. Contrary to most of the proposed methods, the test procedure has the same distribution for short-range and long-range dependence covariance stationary processes, which means that this test is able to detect the presence of non-stationarity for processes showing long-range dependence or which are unit root. The proposed test is formulated in the wavelet domain, where a change in the generalized spectral density results in a change in the variance of wavelet coefficients at one or several scales. Such tests have been already proposed in \cite{whitcher:2001}, but these authors do not have taken into account the dependence of the wavelet coefficients within scales and between scales. Therefore, the asymptotic distribution of the test they have proposed was erroneous; as a consequence, the level of the test under the null hypothesis of stationarity was wrong. In this contribution, we introduce two test procedures, both using an estimator of the variance of the scalogram at one or several scales. The asymptotic distribution of the test under the null is rigorously justified. The pointwise consistency of the test in the presence of a single jump in the general spectral density is also be presented. A limited Monte-Carlo experiment is performed to illustrate our findings.
△ Less
Submitted 7 June, 2011;
originally announced June 2011.
-
Detection and localization of change-points in high-dimensional network traffic data
Authors:
Céline Lévy-Leduc,
François Roueff
Abstract:
We propose a novel and efficient method, that we shall call TopRank in the following paper, for detecting change-points in high-dimensional data. This issue is of growing concern to the network security community since network anomalies such as Denial of Service (DoS) attacks lead to changes in Internet traffic. Our method consists of a data reduction stage based on record filtering, followed by…
▽ More
We propose a novel and efficient method, that we shall call TopRank in the following paper, for detecting change-points in high-dimensional data. This issue is of growing concern to the network security community since network anomalies such as Denial of Service (DoS) attacks lead to changes in Internet traffic. Our method consists of a data reduction stage based on record filtering, followed by a nonparametric change-point detection test based on $U$-statistics. Using this approach, we can address massive data streams and perform anomaly detection and localization on the fly. We show how it applies to some real Internet traffic provided by France-Télécom (a French Internet service provider) in the framework of the ANR-RNRT OSCAR project. This approach is very attractive since it benefits from a low computational load and is able to detect and localize several types of network anomalies. We also assess the performance of the TopRank algorithm using synthetic data and compare it with alternative approaches based on random aggregation.
△ Less
Submitted 17 August, 2009;
originally announced August 2009.