-
Computer code validation via mixture model estimation
Authors:
Kaniav Kamary,
Merlin Keller,
Pierre Barbillon,
Cédric Gœury,
Éric Parent
Abstract:
When computer codes are used for modeling complex physical systems, their unknown parameters are tuned by calibration techniques. A discrepancy function may be added to the computer code in order to capture its discrepancy with the real physical process. By considering the validation question of a computer code as a Bayesian selection model problem, Damblin et al. (2016) have highlighted a possibl…
▽ More
When computer codes are used for modeling complex physical systems, their unknown parameters are tuned by calibration techniques. A discrepancy function may be added to the computer code in order to capture its discrepancy with the real physical process. By considering the validation question of a computer code as a Bayesian selection model problem, Damblin et al. (2016) have highlighted a possible confounding effect in certain configurations between the code discrepancy and a linear computer code by using a Bayesian testing procedure based on the intrinsic Bayes factor. In this paper, we investigate the issue of code error identifiability by applying another Bayesian model selection technique which has been recently developed by Kamary et al. (2014). By embedding the competing models within an encompassing mixture model, Kamary et al. (2014)'s method allows each observation to belong to a different mixing component, providing a more flexible inference, while remaining competitive in terms of computational cost with the intrinsic Bayesian approach. By using the technique of sharing parameters mentioned in Kamary et al. (2014), an improper non-informative prior can be used for some computer code parameters and we demonstrate that the resulting posterior distribution is proper. We then check the sensitivity of our posterior estimates to the choice of the parameter prior distributions. We illustrate that the value of the correlation length of the discrepancy Gaussian process prior impacts the Bayesian inference of the mixture model parameters and that the model discrepancy can be identified by applying the Kamary et al. (2014) method when the correlation length is not too small. Eventually, the proposed method is applied on a hydraulic code in an industrial context.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Validation of a computer code for the energy consumption of a building, with application to optimal electric bill pricing
Authors:
M. Keller,
G. Damblin,
A. Pasanisi,
M. Schuman,
P. Barbillon,
F. Ruggeri,
E. Parent
Abstract:
In this paper, we propose a practical Bayesian framework for the calibration and validation of a computer code, and apply it to a case study concerning the energy consumption forecasting of a building. Validation allows to quantify forecasting uncertainties in view of the code's final use. Here we explore the situation where an energy provider promotes new energy contracts for residential building…
▽ More
In this paper, we propose a practical Bayesian framework for the calibration and validation of a computer code, and apply it to a case study concerning the energy consumption forecasting of a building. Validation allows to quantify forecasting uncertainties in view of the code's final use. Here we explore the situation where an energy provider promotes new energy contracts for residential buildings, tailored to each customer's needs, and including a guarantee of energy performance.
Based on power field measurements, collected from an experimental building cell over a certain time period, the code is calibrated, effectively reducing the epistemic uncertainty affecting some code parameters (here albedo, thermal bridge factor and convective coefficient). Validation is conducted by testing the goodness of fit of the code with respect to field measures, and then by propagating the a posteriori parametric uncertainty through the code, yielding probabilistic forecasts of the average electric power delivered inside the cell over a given time period.
To illustrate the benefits of the proposed Bayesian validation framework, we address the decision problem for an energy supplier offering a new type of contract, wherein the customer pays a fixed fee chosen in advance, based on an overall energy consumption forecast. According to Bayesian decision theory, we show how to choose such a fee optimally from the point of view of the supplier, in order to balance short-terms benefits with customer loyalty.
△ Less
Submitted 21 September, 2018;
originally announced October 2018.
-
CaliCo: a R package for Bayesian calibration
Authors:
Mathieu Carmassi,
Pierre Barbillon,
Matthieu Chiodetti,
Merlin Keller,
Eric Parent
Abstract:
In this article, we present a recently released R package for Bayesian calibration. Many industrial fields are facing unfeasible or costly field experiments. These experiments are replaced with numerical/computer experiments which are realized by running a numerical code. Bayesian calibration intends to estimate, through a posterior distribution, input parameters of the code in order to make the c…
▽ More
In this article, we present a recently released R package for Bayesian calibration. Many industrial fields are facing unfeasible or costly field experiments. These experiments are replaced with numerical/computer experiments which are realized by running a numerical code. Bayesian calibration intends to estimate, through a posterior distribution, input parameters of the code in order to make the code outputs close to the available experimental data. The code can be time consuming while the Bayesian calibration implies a lot of code calls which makes studies too burdensome. A discrepancy might also appear between the numerical code and the physical system when facing incompatibility between experimental data and numerical code outputs. The package CaliCo deals with these issues through four statistical models which deal with a time consuming code or not and with discrepancy or not. A guideline for users is provided in order to illustrate the main functions and their arguments. Eventually, a toy example is detailed using CaliCo. This example (based on a real physical system) is in five dimensions and uses simulated data.
△ Less
Submitted 29 August, 2018; v1 submitted 3 August, 2018;
originally announced August 2018.
-
Post-processing multi-ensemble temperature and precipitation forecasts through an Exchangeable Gamma Normal model and its Tobit extension
Authors:
Marie Courbariaux,
Pierre Barbillon,
Luc Perreault,
Éric Parent
Abstract:
Meteorological ensembles are a collection of scenarios for future weather delivered by a meteorological center. Such ensembles form the main source of valuable information for probabilistic forecasting which aims at producing a predictive probability distribution of the quantity of interest instead of a single best guess estimate. Unfortunately, ensembles cannot generally be considered as a sample…
▽ More
Meteorological ensembles are a collection of scenarios for future weather delivered by a meteorological center. Such ensembles form the main source of valuable information for probabilistic forecasting which aims at producing a predictive probability distribution of the quantity of interest instead of a single best guess estimate. Unfortunately, ensembles cannot generally be considered as a sample from such a predictive probability distribution without a preliminary post-processing treatment to calibrate the ensemble. Two main families of post-processing methods, either competing such as BMA or collaborative such as EMOS, can be found in the literature. This paper proposes a mixed effect model belonging to the collaborative family. The structure of the model is based on the hypothesis of invariance under the relabelling of the ensemble members. Its interesting specificities are as follows: 1) exchangeability, which contributes to parsimony, with a latent pivot variable synthesizing the essential meteorological features of the ensembles, 2) a multi-ensemble implementation, allowing to take advantage of various information so as to increase the sharpness of the forecasting procedure. Focus is cast onto Normal statistical structures, first with a direct application for temperatures, then with its Tobit extension for precipitation. Inference is performed by EM algorithms with recourse made to stochastic conditional simulations in the precipitation case. After checking its good behavior on artificial data, the proposed post-processing technique is applied to temperature and precipitation ensemble forecasts produced over five river basins managed by Hydro-Qu$é$bec. These ensemble forecasts were extracted from the THORPEX Interactive Grand Global Ensemble (TIGGE) database. The results indicate that post-processed ensemble are calibrated and generally sharper than the raw ensembles.
△ Less
Submitted 5 March, 2019; v1 submitted 24 April, 2018;
originally announced April 2018.
-
Bayesian calibration of a numerical code for prediction
Authors:
Mathieu Carmassi,
Pierre Barbillon,
Merlin Keller,
Eric Parent,
Matthieu Chiodetti
Abstract:
Field experiments are often difficult and expensive to make. To bypass these issues, industrial companies have developed computational codes. These codes intend to be representative of the physical system, but come with a certain amount of problems. The code intends to be as close as possible to the physical system. It turns out that, despite continuous code development, the difference between the…
▽ More
Field experiments are often difficult and expensive to make. To bypass these issues, industrial companies have developed computational codes. These codes intend to be representative of the physical system, but come with a certain amount of problems. The code intends to be as close as possible to the physical system. It turns out that, despite continuous code development, the difference between the code outputs and experiments can remain significant. Two kinds of uncertainties are observed. The first one comes from the difference between the physical phenomenon and the values recorded experimentally. The second concerns the gap between the code and the physical system. To reduce this difference, often named model bias, discrepancy, or model error, computer codes are generally complexified in order to make them more realistic. These improvements lead to time consuming codes. Moreover, a code often depends on parameters to be set by the user to make the code as close as possible to field data. This estimation task is called calibration. This paper proposes a review of Bayesian calibration methods and is based on an application case which makes it possible to discuss the various methodological choices and to illustrate their divergences. This example is based on a code used to predict the power of a photovoltaic plant.
△ Less
Submitted 25 March, 2019; v1 submitted 5 January, 2018;
originally announced January 2018.
-
Adaptive numerical designs for the calibration of computer codes
Authors:
Guillaume Damblin,
Pierre Barbillon,
Merlin Keller,
Alberto Pasanisi,
Eric Parent
Abstract:
Making good predictions of a physical system using a computer code requires the inputs to be carefully specified. Some of these inputs called control variables have to reproduce physical conditions whereas other inputs, called parameters, are specific to the computer code and most often uncertain. The goal of statistical calibration consists in estimating these parameters with the help of a statis…
▽ More
Making good predictions of a physical system using a computer code requires the inputs to be carefully specified. Some of these inputs called control variables have to reproduce physical conditions whereas other inputs, called parameters, are specific to the computer code and most often uncertain. The goal of statistical calibration consists in estimating these parameters with the help of a statistical model which links the code outputs with the field measurements. In a Bayesian setting, the posterior distribution of these parameters is normally sampled using MCMC methods. However, they are impractical when the code runs are high time-consuming. A way to circumvent this issue consists of replacing the computer code with a Gaussian process emulator, then sampling a cheap-to-evaluate posterior distribution based on it. Doing so, calibration is subject to an error which strongly depends on the numerical design of experiments used to fit the emulator. We aim at reducing this error by building a proper sequential design by means of the Expected Improvement criterion. Numerical illustrations in several dimensions assess the efficiency of such sequential strategies.
△ Less
Submitted 3 April, 2018; v1 submitted 25 February, 2015;
originally announced February 2015.
-
On the Role of Decision Theory in Uncertainty Analysis
Authors:
Merlin Keller,
Eric Parent,
Alberto Pasanisi
Abstract:
Maximum likelihood estimation (MLE) and heuristic predictive estimation (HPE) are two widely used approaches in industrial uncertainty analysis. We review them from the point of view of decision theory, using Bayesian inference as a gold standard for comparison. The main drawback of MLE is that it may fail to properly account for the uncertainty on the physical process generating the data, especia…
▽ More
Maximum likelihood estimation (MLE) and heuristic predictive estimation (HPE) are two widely used approaches in industrial uncertainty analysis. We review them from the point of view of decision theory, using Bayesian inference as a gold standard for comparison. The main drawback of MLE is that it may fail to properly account for the uncertainty on the physical process generating the data, especially when only a small amount of data are available. HPE offers an improvement in that it takes this uncertainty into account. However, we show that this approach is actually equivalent to Bayes estimation for a particular cost function that is not explicitly chosen by the decision maker. This may produce results that are suboptimal from a decisional perspective. These results plead for a systematic use of Bayes estimators based on carefully defined cost functions.
△ Less
Submitted 22 September, 2010;
originally announced September 2010.
-
Random effects compound Poisson model to represent data with extra zeros
Authors:
Marie-Pierre Etienne,
Eric Parent,
Benoit Hugues,
Bernier Jacques
Abstract:
This paper describes a compound Poisson-based random effects structure for modeling zero-inflated data. Data with large proportion of zeros are found in many fields of applied statistics, for example in ecology when trying to model and predict species counts (discrete data) or abundance distributions (continuous data). Standard methods for modeling such data include mixture and two-part conditio…
▽ More
This paper describes a compound Poisson-based random effects structure for modeling zero-inflated data. Data with large proportion of zeros are found in many fields of applied statistics, for example in ecology when trying to model and predict species counts (discrete data) or abundance distributions (continuous data). Standard methods for modeling such data include mixture and two-part conditional models. Conversely to these methods, the stochastic models proposed here behave coherently with regards to a change of scale, since they mimic the harvesting of a marked Poisson process in the modeling steps. Random effects are used to account for inhomogeneity. In this paper, model design and inference both rely on conditional thinking to understand the links between various layers of quantities : parameters, latent variables including random effects and zero-inflated observations. The potential of these parsimonious hierarchical models for zero-inflated data is exemplified using two marine macroinvertebrate abundance datasets from a large scale scientific bottom-trawl survey. The EM algorithm with a Monte Carlo step based on importance sampling is checked for this model structure on a simulated dataset : it proves to work well for parameter estimation but parameter values matter when re-assessing the actual coverage level of the confidence regions far from the asymptotic conditions.
△ Less
Submitted 28 July, 2009;
originally announced July 2009.