-
Hold-out estimates of prediction models for Markov processes
Authors:
Remy Garnier,
Raphaël Langhendries,
Joseph Rynkiewicz
Abstract:
We consider the selection of prediction models for Markovian time series. For this purpose, we study the theoretical properties of the hold-out method. In the econometrics literature, the hold-out method is called out-of-sample and is the main method to select a suitable time series model. This method consists of estimating models on a learning set and picking up the model with minimal empirical e…
▽ More
We consider the selection of prediction models for Markovian time series. For this purpose, we study the theoretical properties of the hold-out method. In the econometrics literature, the hold-out method is called out-of-sample and is the main method to select a suitable time series model. This method consists of estimating models on a learning set and picking up the model with minimal empirical error on a validation set of future observations. Hold-out estimates are well studied in the independent case, but, as far as we know, this is not the case when the validation set is not independent of the learning set. In this paper, assuming uniform ergodicity of the Markov chain, we state generalization bounds and oracle inequalities for such method; in particular, we show that the out-of-sample selection method is adaptative to noise condition.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Spectral estimation for non-linear long range dependent discrete time trawl processes
Authors:
Paul Doukhan,
François Roueff,
Joseph Rynkiewicz
Abstract:
Discrete time trawl processes constitute a large class of time series parameterized by a trawl sequence (a j) j$\in$N and defined though a sequence of independent and identically distributed (i.i.d.) copies of a continuous time process ($γ$(t)) t$\in$R called the seed process. They provide a general framework for modeling linear or non-linear long range dependent time series. We investigate the sp…
▽ More
Discrete time trawl processes constitute a large class of time series parameterized by a trawl sequence (a j) j$\in$N and defined though a sequence of independent and identically distributed (i.i.d.) copies of a continuous time process ($γ$(t)) t$\in$R called the seed process. They provide a general framework for modeling linear or non-linear long range dependent time series. We investigate the spectral estimation, either pointwise or broadband, of long range dependent discrete-time trawl processes. The difficulty arising from the variety of seed processes and of trawl sequences is twofold. First, the spectral density may take different forms, often including smooth additive correction terms. Second, trawl processes with similar spectral densities may exhibit very different statistical behaviors. We prove the consistency of our estimators under very general conditions and we show that a wide class of trawl processes satisfy them. This is done in particular by introducing a weighted weak dependence index that can be of independent interest. The broadband spectral estimator includes an estimator of the long memory parameter. We complete this work with numerical experiments to evaluate the finite sample size performance of this estimator for various integer valued discrete time trawl processes.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Asymptotics for regression models under loss of identifiability
Authors:
Joseph Rynkiewicz
Abstract:
This paper discusses the asymptotic behavior of regression models under general conditions. First, we give a general inequality for the difference of the sum of square errors (SSE) of the estimated regression model and the SSE of the theoretical best regression function in our model. A set of generalized derivative functions is a key tool in deriving such inequality. Under suitable Donsker conditi…
▽ More
This paper discusses the asymptotic behavior of regression models under general conditions. First, we give a general inequality for the difference of the sum of square errors (SSE) of the estimated regression model and the SSE of the theoretical best regression function in our model. A set of generalized derivative functions is a key tool in deriving such inequality. Under suitable Donsker condition for this set, we give the asymptotic distribution for the difference of SSE. We show how to get this Donsker property for parametric models even if the parameters characterizing the best regression function are not unique. This result is applied to neural networks regression models with redundant hidden units when loss of identifiability occurs.
△ Less
Submitted 16 September, 2013;
originally announced September 2013.
-
General bound of overfitting for MLP regression models
Authors:
Joseph Rynkiewicz
Abstract:
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood r…
▽ More
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under weak assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite. As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of an MLP.
△ Less
Submitted 3 January, 2012;
originally announced January 2012.
-
Asymptotic law of likelihood ratio for multilayer perceptron models
Authors:
Joseph Rynkiewicz
Abstract:
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The data are assumed to be generated by a true MLP model and the estimation of the parameters of the MLP is done by maximizing the likelihood of the model. When the number of hidden units of the true model is known, the asymptotic distribution of the maximum likelihood estimator (MLE) a…
▽ More
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The data are assumed to be generated by a true MLP model and the estimation of the parameters of the MLP is done by maximizing the likelihood of the model. When the number of hidden units of the true model is known, the asymptotic distribution of the maximum likelihood estimator (MLE) and the likelihood ratio (LR) statistic is easy to compute and converge to a $χ^2$ law. However, if the number of hidden unit is over-estimated the Fischer information matrix of the model is singular and the asymptotic behavior of the MLE is unknown. This paper deals with this case, and gives the exact asymptotic law of the LR statistics. Namely, if the parameters of the MLP lie in a suitable compact set, we show that the LR statistics is the supremum of the square of a Gaussian process indexed by a class of limit score functions.
△ Less
Submitted 28 November, 2010;
originally announced November 2010.
-
Estimating the Number of Components in a Mixture of Multilayer Perceptrons
Authors:
Madalina Olteanu,
Joseph Rynkiewicz
Abstract:
BIC criterion is widely used by the neural-network community for model selection tasks, although its convergence properties are not always theoretically established. In this paper we will focus on estimating the number of components in a mixture of multilayer perceptrons and proving the convergence of the BIC criterion in this frame. The penalized marginal-likelihood for mixture models and hidde…
▽ More
BIC criterion is widely used by the neural-network community for model selection tasks, although its convergence properties are not always theoretically established. In this paper we will focus on estimating the number of components in a mixture of multilayer perceptrons and proving the convergence of the BIC criterion in this frame. The penalized marginal-likelihood for mixture models and hidden Markov models introduced by Keribin (2000) and, respectively, Gassiat (2002) is extended to mixtures of multilayer perceptrons for which a penalized-likelihood criterion is proposed. We prove its convergence under some hypothesis which involve essentially the bracketing entropy of the generalized score-functions class and illustrate it by some numerical examples.
△ Less
Submitted 4 April, 2008;
originally announced April 2008.
-
A 24-h forecast of ozone peaks and exceedance levels using neural classifiers and weather predictions
Authors:
A. Dutot,
Joseph Rynkiewicz,
F. Steiner,
J. Rude
Abstract:
A neural network combined to a neural classifier is used in a real time forecasting of hourly maximum ozone in the centre of France, in an urban atmosphere. This neural model is based on the MultiLayer Perceptron (MLP) structure. The inputs of the statistical network are model output statistics of the weather predictions from the French National Weather Service. With this neural classifier, the…
▽ More
A neural network combined to a neural classifier is used in a real time forecasting of hourly maximum ozone in the centre of France, in an urban atmosphere. This neural model is based on the MultiLayer Perceptron (MLP) structure. The inputs of the statistical network are model output statistics of the weather predictions from the French National Weather Service. With this neural classifier, the Success Index of forecasting is 78% whereas it is from 65% to 72% with the classical MLPs. During the validation phase, in the Summer of 2003, six ozone peaks above the threshold were detected. They actually were seven.
△ Less
Submitted 27 February, 2008;
originally announced February 2008.
-
Consistent estimation of the architecture of multilayer perceptrons
Authors:
Joseph Rynkiewicz
Abstract:
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The estimation of the parameters of the MLP can be done by maximizing the likelihood of the model. In this framework, it is difficult to determine the true number of hidden units using an information criterion, like the Bayesian information criteria (BIC), because the information matr…
▽ More
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The estimation of the parameters of the MLP can be done by maximizing the likelihood of the model. In this framework, it is difficult to determine the true number of hidden units using an information criterion, like the Bayesian information criteria (BIC), because the information matrix of Fisher is not invertible if the number of hidden units is overestimated. Indeed, the classical theoretical justification of information criteria relies entirely on the invertibility of this matrix. However, using recent methodology introduced to deal with models with a loss of identifiability, we prove that suitable information criterion leads to consistent estimation of the true number of hidden units.
△ Less
Submitted 22 February, 2008;
originally announced February 2008.
-
Estimation consistante de l'architecture des perceptrons multicouches
Authors:
Joseph Rynkiewicz
Abstract:
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The estimation of the parameters of the MLP can be done by maximizing the likelihood of the model. In this framework, it is difficult to determine the true number of hidden units because the information matrix of Fisher is not invertible if this number is overestimated. However, if th…
▽ More
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The estimation of the parameters of the MLP can be done by maximizing the likelihood of the model. In this framework, it is difficult to determine the true number of hidden units because the information matrix of Fisher is not invertible if this number is overestimated. However, if the parameters of the MLP are in a compact set, we prove that the minimization of a suitable information criteria leads to consistent estimation of the true number of hidden units.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Consistance d'un estimateur de minimum de variance étendue
Authors:
Joseph Rynkiewicz
Abstract:
We consider a generalization of the criterion minimized by the K-means algorithm, where a neighborhood structure is used in the calculus of the variance. Such tool is used, for example with Kohonen maps, to measure the quality of the quantification preserving the neighborhood relationships. If we assume that the parameter vector is in a compact Euclidean space and all it components are separated…
▽ More
We consider a generalization of the criterion minimized by the K-means algorithm, where a neighborhood structure is used in the calculus of the variance. Such tool is used, for example with Kohonen maps, to measure the quality of the quantification preserving the neighborhood relationships. If we assume that the parameter vector is in a compact Euclidean space and all it components are separated by a minimal distance, we show the strong consistency of the set of parameters almost realizing the minimum of the empirical extended variance.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Efficient Estimation of Multidimensional Regression Model using Multilayer Perceptrons
Authors:
Joseph Rynkiewicz
Abstract:
This work concerns the estimation of multidimensional nonlinear regression models using multilayer perceptrons (MLPs). The main problem with such models is that we need to know the covariance matrix of the noise to get an optimal estimator. However, we show in this paper that if we choose as the cost function the logarithm of the determinant of the empirical error covariance matrix, then we get…
▽ More
This work concerns the estimation of multidimensional nonlinear regression models using multilayer perceptrons (MLPs). The main problem with such models is that we need to know the covariance matrix of the noise to get an optimal estimator. However, we show in this paper that if we choose as the cost function the logarithm of the determinant of the empirical error covariance matrix, then we get an asymptotically optimal estimator. Moreover, under suitable assumptions, we show that this cost function leads to a very simple asymptotic law for testing the number of parameters of an identifiable MLP. Numerical experiments confirm the theoretical results.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Self Organizing Map algorithm and distortion measure
Authors:
Joseph Rynkiewicz
Abstract:
We study the statistical meaning of the minimization of distortion measure and the relation between the equilibrium points of the SOM algorithm and the minima of distortion measure. If we assume that the observations and the map lie in an compact Euclidean space, we prove the strong consistency of the map which almost minimizes the empirical distortion. Moreover, after calculating the derivative…
▽ More
We study the statistical meaning of the minimization of distortion measure and the relation between the equilibrium points of the SOM algorithm and the minima of distortion measure. If we assume that the observations and the map lie in an compact Euclidean space, we prove the strong consistency of the map which almost minimizes the empirical distortion. Moreover, after calculating the derivatives of the theoretical distortion measure, we show that the points minimizing this measure and the equilibria of the Kohonen map do not match in general. We illustrate, with a simple example, how this occurs.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Estimation of linear autoregressive models with Markov-switching, the E.M. algorithm revisited
Authors:
Joseph Rynkiewicz
Abstract:
This work concerns estimation of linear autoregressive models with Markov-switching using expectation maximisation (E.M.) algorithm. Our method generalise the method introduced by Elliot for general hidden Markov models and avoid to use backward recursion.
This work concerns estimation of linear autoregressive models with Markov-switching using expectation maximisation (E.M.) algorithm. Our method generalise the method introduced by Elliot for general hidden Markov models and avoid to use backward recursion.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Efficient Estimation of Multidimensional Regression Model with Multilayer Perceptron
Authors:
Joseph Rynkiewicz
Abstract:
This work concerns estimation of multidimensional nonlinear regression models using multilayer perceptron (MLP). The main problem with such model is that we have to know the covariance matrix of the noise to get optimal estimator. however we show that, if we choose as cost function the logarithm of the determinant of the empirical error covariance matrix, we get an asymptotically optimal estimat…
▽ More
This work concerns estimation of multidimensional nonlinear regression models using multilayer perceptron (MLP). The main problem with such model is that we have to know the covariance matrix of the noise to get optimal estimator. however we show that, if we choose as cost function the logarithm of the determinant of the empirical error covariance matrix, we get an asymptotically optimal estimator.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Testing the number of parameters with multidimensional MLP
Authors:
Joseph Rynkiewicz
Abstract:
This work concerns testing the number of parameters in one hidden layer multilayer perceptron (MLP). For this purpose we assume that we have identifiable models, up to a finite group of transformations on the weights, this is for example the case when the number of hidden units is know. In this framework, we show that we get a simple asymptotic distribution, if we use the logarithm of the determ…
▽ More
This work concerns testing the number of parameters in one hidden layer multilayer perceptron (MLP). For this purpose we assume that we have identifiable models, up to a finite group of transformations on the weights, this is for example the case when the number of hidden units is know. In this framework, we show that we get a simple asymptotic distribution, if we use the logarithm of the determinant of the empirical error covariance matrix as cost function.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Estimation and Test for Multidimensional Regression Models
Authors:
Joseph Rynkiewicz
Abstract:
This work is concerned with the estimation of multidimensional regression and the asymptotic behaviour of the test involved in selecting models. The main problem with such models is that we need to know the covariance matrix of the noise to get an optimal estimator. We show in this paper that if we choose to minimise the logarithm of the determinant of the empirical error covariance matrix, then…
▽ More
This work is concerned with the estimation of multidimensional regression and the asymptotic behaviour of the test involved in selecting models. The main problem with such models is that we need to know the covariance matrix of the noise to get an optimal estimator. We show in this paper that if we choose to minimise the logarithm of the determinant of the empirical error covariance matrix, then we get an asymptotically optimal estimator. Moreover, under suitable assumptions, we show that this cost function leads to a very simple asymptotic law for testing the number of parameters of an identifiable and regular regression model. Numerical experiments confirm the theoretical results.
△ Less
Submitted 19 February, 2008;
originally announced February 2008.