-
Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees
Authors:
Edgar Jaber,
Vincent Blot,
Nicolas Brunel,
Vincent Chabridon,
Emmanuel Remy,
Bertrand Iooss,
Didier Lucor,
Mathilde Mougeot,
Alessandro Leite
Abstract:
Gaussian processes (GPs) are a Bayesian machine learning approach widely used to construct surrogate models for the uncertainty quantification of computer simulation codes in industrial applications. It provides both a mean predictor and an estimate of the posterior prediction variance, the latter being used to produce Bayesian credibility intervals. Interpreting these intervals relies on the Gaus…
▽ More
Gaussian processes (GPs) are a Bayesian machine learning approach widely used to construct surrogate models for the uncertainty quantification of computer simulation codes in industrial applications. It provides both a mean predictor and an estimate of the posterior prediction variance, the latter being used to produce Bayesian credibility intervals. Interpreting these intervals relies on the Gaussianity of the simulation model as well as the well-specification of the priors which are not always appropriate. We propose to address this issue with the help of conformal prediction. In the present work, a method for building adaptive cross-conformal prediction intervals is proposed by weighting the non-conformity score with the posterior standard deviation of the GP. The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets and display a significant correlation with the surrogate model local approximation error, while being free from the underlying model assumptions and having frequentist coverage guarantees. These estimators can thus be used for evaluating the quality of a GP surrogate model and can assist a decision-maker in the choice of the best prior for the specific application of the GP. The performance of the method is illustrated through a panel of numerical examples based on various reference databases. Moreover, the potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Sensitivity Analyses of a Multi-Physics Long-Term Clogging Model For Steam Generators
Authors:
Edgar Jaber,
Vincent Chabridon,
Emmanuel Remy,
Michael Baudin,
Didier Lucor,
Mathilde Mougeot,
Bertrand Iooss
Abstract:
Long-term operation of nuclear steam generators can result in the occurrence of clogging, a deposition phenomenon that may increase the risk of mechanical and vibration loadings on tube bundles and internal structures as well as potentially affecting their response to hypothetical accidental transients. To manage and prevent this issue, a robust maintenance program that requires a fine understandi…
▽ More
Long-term operation of nuclear steam generators can result in the occurrence of clogging, a deposition phenomenon that may increase the risk of mechanical and vibration loadings on tube bundles and internal structures as well as potentially affecting their response to hypothetical accidental transients. To manage and prevent this issue, a robust maintenance program that requires a fine understanding of the underlying physics is essential. This study focuses on the utilization of a clogging simulation code developed by EDF R\&D. This numerical tool employs specific physical models to simulate the kinetics of clogging and generates time dependent clogging rate profiles for particular steam generators. However, certain parameters in this code are subject to uncertainties. To address these uncertainties, Monte Carlo simulations are conducted to assess the distribution of the clogging rate. Subsequently, polynomial chaos expansions are used in order to build a metamodel while time-dependent Sobol' indices are computed to understand the impact of the random input parameters throughout the whole operating time. Comparisons are made with a previous published study and additional Hilbert-Schmidt independence criterion sensitivity indices are computed. Key input-output dependencies are exhibited in the different chemical conditionings and new behavior patterns in high-pH regimes are uncovered by the sensitivity analysis. These findings contribute to a better understanding of the clogging phenomenon while opening future lines of modeling research and helping in robustifying maintenance planning.
△ Less
Submitted 18 March, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Hoeffding decomposition of black-box models with dependent inputs
Authors:
Marouane Il Idrissi,
Nicolas Bousquet,
Fabrice Gamboa,
Bertrand Iooss,
Jean-Michel Loubes
Abstract:
Performing an additive decomposition of arbitrary functions of random elements is paramount for global sensitivity analysis and, therefore, the interpretation of black-box models. The well-known seminal work of Hoeffding characterized the summands in such a decomposition in the particular case of mutually independent inputs. Going beyond the framework of independent inputs has been an ongoing chal…
▽ More
Performing an additive decomposition of arbitrary functions of random elements is paramount for global sensitivity analysis and, therefore, the interpretation of black-box models. The well-known seminal work of Hoeffding characterized the summands in such a decomposition in the particular case of mutually independent inputs. Going beyond the framework of independent inputs has been an ongoing challenge in the literature. Existing solutions have so far required constraining assumptions or suffer from a lack of interpretability. In this paper, we generalize Hoeffding's decomposition for dependent inputs under very mild conditions. For that purpose, we propose a novel framework to handle dependencies based on probability theory, functional analysis, and combinatorics. It allows for characterizing two reasonable assumptions on the dependence structure of the inputs: non-perfect functional dependence and non-degenerate stochastic dependence. We then show that any square-integrable, real-valued function of random elements respecting these two assumptions can be uniquely additively decomposed and offer a characterization of the summands using oblique projections. We then introduce and discuss the theoretical properties and practical benefits of the sensitivity indices that ensue from this decomposition. Finally, the decomposition is analytically illustrated on bivariate functions of Bernoulli inputs.
△ Less
Submitted 11 September, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Shapley effects and proportional marginal effects for global sensitivity analysis: application to computed tomography scan organ dose estimation
Authors:
Anais Foucault,
Marouane Il Idrissi,
Bertrand Iooss,
Sophie Ancelet
Abstract:
Concerns have been raised about possible cancer risks after exposure to computed tomography (CT) scans in childhood. The health effects of ionizing radiation are then estimated from the absorbed dose to the organs of interest which is calculated, for each CT scan, from dosimetric numerical models, like the one proposed in the NCICT software. Given that a dosimetric model depends on input parameter…
▽ More
Concerns have been raised about possible cancer risks after exposure to computed tomography (CT) scans in childhood. The health effects of ionizing radiation are then estimated from the absorbed dose to the organs of interest which is calculated, for each CT scan, from dosimetric numerical models, like the one proposed in the NCICT software. Given that a dosimetric model depends on input parameters which are most often uncertain, the calculation of absorbed doses is inherently uncertain. A current methodological challenge in radiation epidemiology is thus to be able to account for dose uncertainty in risk estimation. A preliminary important step can be to identify the most influential input parameters implied in dose estimation, before modelling and accounting for their related uncertainty in radiation-induced health risks estimates. In this work, a variance-based global sensitivity analysis was performed to rank by influence the uncertain input parameters of the NCICT software implied in brain and red bone marrow doses estimation, for four classes of CT examinations. Two recent sensitivity indices, especially adapted to the case of dependent input parameters, were estimated, namely: the Shapley effects and the Proportional Marginal Effects (PME). This provides a first comparison of the respective behavior and usefulness of these two indices on a real medical application case. The conclusion is that Shapley effects and PME are intrinsically different, but complementary. Interestingly, we also observed that the proportional redistribution property of the PME allowed for a clearer importance hierarchy between the input parameters.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
A comparison between Bayesian and ordinary kriging based on validation criteria: application to radiological characterisation
Authors:
Martin Wieskotten,
Marielle Crozet,
Bertrand Iooss,
Céline Lacaux,
Amandine Marrel
Abstract:
In decommissioning projects of nuclear facilities, the radiological characterisation step aims to estimate the quantity and spatial distribution of different radionuclides. To carry out the estimation, measurements are performed on site to obtain preliminary information. The usual industrial practice consists in applying spatial interpolation tools (as the ordinary kriging method) on these data to…
▽ More
In decommissioning projects of nuclear facilities, the radiological characterisation step aims to estimate the quantity and spatial distribution of different radionuclides. To carry out the estimation, measurements are performed on site to obtain preliminary information. The usual industrial practice consists in applying spatial interpolation tools (as the ordinary kriging method) on these data to predict the value of interest for the contamination (radionuclide concentration, radioactivity, etc.) at unobserved positions. This paper questions the ordinary kriging tool on the well-known problem of the overoptimistic prediction variances due to not taking into account uncertainties on the estimation of the kriging parameters (variance and range). To overcome this issue, the practical use of the Bayesian kriging method, where the model parameters are considered as random variables, is deepened. The usefulness of Bayesian kriging, whilst comparing its performance to that of ordinary kriging, is demonstrated in the small data context (which is often the case in decommissioning projects). This result is obtained via several numerical tests on different toy models, and using complementary validation criteria: the predictivity coefficient (Q${}^2$), the Predictive Variance Adequacy (PVA), the $α$-Confidence Interval plot (and its associated Mean Squared Error alpha (MSEalpha)), and the Predictive Interval Adequacy (PIA). The latter is a new criterion adapted to the Bayesian kriging results. Finally, the same comparison is performed on a real dataset coming from the decommissioning project of the CEA Marcoule G3 reactor. It illustrates the practical interest of Bayesian kriging in industrial radiological characterisation.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
On the coalitional decomposition of parameters of interest
Authors:
Marouane Il Idrissi,
Nicolas Bousquet,
Fabrice Gamboa,
Bertrand Iooss,
Jean-Michel Loubes
Abstract:
Understanding the behavior of a black-box model with probabilistic inputs can be based on the decomposition of a parameter of interest (e.g., its variance) into contributions attributed to each coalition of inputs (i.e., subsets of inputs). In this paper, we produce conditions for obtaining unambiguous and interpretable decompositions of very general parameters of interest. This allows to recover…
▽ More
Understanding the behavior of a black-box model with probabilistic inputs can be based on the decomposition of a parameter of interest (e.g., its variance) into contributions attributed to each coalition of inputs (i.e., subsets of inputs). In this paper, we produce conditions for obtaining unambiguous and interpretable decompositions of very general parameters of interest. This allows to recover known decompositions, holding under weaker assumptions than stated in the literature.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
Proportional marginal effects for global sensitivity analysis
Authors:
Margot Herin,
Marouane Il Idrissi,
Vincent Chabridon,
Bertrand Iooss
Abstract:
Performing (variance-based) global sensitivity analysis (GSA) with dependent inputs has recently benefited from cooperative game theory concepts.By using this theory, despite the potential correlation between the inputs, meaningful sensitivity indices can be defined via allocation shares of the model output's variance to each input. The ``Shapley effects'', i.e., the Shapley values transposed to v…
▽ More
Performing (variance-based) global sensitivity analysis (GSA) with dependent inputs has recently benefited from cooperative game theory concepts.By using this theory, despite the potential correlation between the inputs, meaningful sensitivity indices can be defined via allocation shares of the model output's variance to each input. The ``Shapley effects'', i.e., the Shapley values transposed to variance-based GSA problems, allowed for this suitable solution. However, these indices exhibit a particular behavior that can be undesirable: an exogenous input (i.e., which is not explicitly included in the structural equations of the model) can be associated with a strictly positive index when it is correlated to endogenous inputs. In the present work, the use of a different allocation, called the ``proportional values'' is investigated. A first contribution is to propose an extension of this allocation, suitable for variance-based GSA. Novel GSA indices are then proposed, called the ``proportional marginal effects'' (PME). The notion of exogeneity is formally defined in the context of variance-based GSA, and it is shown that the PME allow the distinction of exogenous variables, even when they are correlated to endogenous inputs. Moreover, their behavior is compared to the Shapley effects on analytical toy-cases and more realistic use-cases.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Quantile-constrained Wasserstein projections for robust interpretability of numerical and machine learning models
Authors:
Marouane Il Idrissi,
Nicolas Bousquet,
Fabrice Gamboa,
Bertrand Iooss,
Jean-Michel Loubes
Abstract:
Robustness studies of black-box models is recognized as a necessary task for numerical models based on structural equations and predictive models learned from data. These studies must assess the model's robustness to possible misspecification of regarding its inputs (e.g., covariate shift). The study of black-box models, through the prism of uncertainty quantification (UQ), is often based on sensi…
▽ More
Robustness studies of black-box models is recognized as a necessary task for numerical models based on structural equations and predictive models learned from data. These studies must assess the model's robustness to possible misspecification of regarding its inputs (e.g., covariate shift). The study of black-box models, through the prism of uncertainty quantification (UQ), is often based on sensitivity analysis involving a probabilistic structure imposed on the inputs, while ML models are solely constructed from observed data. Our work aim at unifying the UQ and ML interpretability approaches, by providing relevant and easy-to-use tools for both paradigms. To provide a generic and understandable framework for robustness studies, we define perturbations of input information relying on quantile constraints and projections with respect to the Wasserstein distance between probability measures, while preserving their dependence structure. We show that this perturbation problem can be analytically solved. Ensuring regularity constraints by means of isotonic polynomial approximations leads to smoother perturbations, which can be more suitable in practice. Numerical experiments on real case studies, from the UQ and ML fields, highlight the computational feasibility of such studies and provide local and global insights on the robustness of black-box models to input perturbations.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Global sensitivity analysis using derivative-based sparse Poincaré chaos expansions
Authors:
Nora Lüthen,
Olivier Roustant,
Fabrice Gamboa,
Bertrand Iooss,
Stefano Marelli,
Bruno Sudret
Abstract:
Variance-based global sensitivity analysis, in particular Sobol' analysis, is widely used for determining the importance of input variables to a computational model. Sobol' indices can be computed cheaply based on spectral methods like polynomial chaos expansions (PCE). Another choice are the recently developed Poincaré chaos expansions (PoinCE), whose orthonormal tensor-product basis is generated…
▽ More
Variance-based global sensitivity analysis, in particular Sobol' analysis, is widely used for determining the importance of input variables to a computational model. Sobol' indices can be computed cheaply based on spectral methods like polynomial chaos expansions (PCE). Another choice are the recently developed Poincaré chaos expansions (PoinCE), whose orthonormal tensor-product basis is generated from the eigenfunctions of one-dimensional Poincaré differential operators. In this paper, we show that the Poincaré basis is the unique orthonormal basis with the property that partial derivatives of the basis form again an orthogonal basis with respect to the same measure as the original basis. This special property makes PoinCE ideally suited for incorporating derivative information into the surrogate modelling process. Assuming that partial derivative evaluations of the computational model are available, we compute spectral expansions in terms of Poincaré basis functions or basis partial derivatives, respectively, by sparse regression. We show on two numerical examples that the derivative-based expansions provide accurate estimates for Sobol' indices, even outperforming PCE in terms of bias and variance. In addition, we derive an analytical expression based on the PoinCE coefficients for a second popular sensitivity index, the derivative-based sensitivity measure (DGSM), and explore its performance as upper bound to the corresponding total Sobol' indices.
△ Less
Submitted 9 June, 2023; v1 submitted 1 July, 2021;
originally announced July 2021.
-
Sample selection from a given dataset to validate machine learning models
Authors:
Bertrand Iooss
Abstract:
The selection of a validation basis from a full dataset is often required in industrial use of supervised machine learning algorithm. This validation basis will serve to realize an independent evaluation of the machine learning model. To select this basis, we propose to adopt a "design of experiments" point of view, by using statistical criteria. We show that the "support points" concept, based on…
▽ More
The selection of a validation basis from a full dataset is often required in industrial use of supervised machine learning algorithm. This validation basis will serve to realize an independent evaluation of the machine learning model. To select this basis, we propose to adopt a "design of experiments" point of view, by using statistical criteria. We show that the "support points" concept, based on Maximum Mean Discrepancy criteria, is particularly relevant. An industrial test case from the company EDF illustrates the practical interest of the methodology.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs
Authors:
Marouane Il Idrissi,
Vincent Chabridon,
Bertrand Iooss
Abstract:
Reliability-oriented sensitivity analysis methods have been developed for understanding the influence of model inputs relative to events which characterize the failure of a system (e.g., a threshold exceedance of the model output). In this field, the target sensitivity analysis focuses primarily on capturing the influence of the inputs on the occurrence of such a critical event. This paper propose…
▽ More
Reliability-oriented sensitivity analysis methods have been developed for understanding the influence of model inputs relative to events which characterize the failure of a system (e.g., a threshold exceedance of the model output). In this field, the target sensitivity analysis focuses primarily on capturing the influence of the inputs on the occurrence of such a critical event. This paper proposes new target sensitivity indices, based on the Shapley values and called "target Shapley effects", allowing for interpretable sensitivity measures under dependent inputs. Two algorithms (one based on Monte Carlo sampling, and a given-data algorithm based on a nearest-neighbors procedure) are proposed for the estimation of these target Shapley effects based on the $\ell^2$ norm. Additionally, the behavior of these target Shapley effects are theoretically and empirically studied through various toy-cases. Finally, the application of these new indices in two real-world use-cases (a river flood model and a COVID-19 epidemiological model) is discussed.
△ Less
Submitted 19 May, 2021; v1 submitted 20 January, 2021;
originally announced January 2021.
-
A graph clustering approach to localization for adaptive covariance tuning in data assimilation based on state-observation mapping
Authors:
Sibo Cheng,
Jean-Philippe Argaud,
Bertrand Iooss,
Angélique Ponçot,
Didier Lucor
Abstract:
An original graph clustering approach to efficient localization of error covariances is proposed within an ensemble-variational data assimilation framework. Here the localization term is very generic and refers to the idea of breaking up a global assimilation into subproblems. This unsupervised localization technique based on a linearizedstate-observation measure is general and does not rely on…
▽ More
An original graph clustering approach to efficient localization of error covariances is proposed within an ensemble-variational data assimilation framework. Here the localization term is very generic and refers to the idea of breaking up a global assimilation into subproblems. This unsupervised localization technique based on a linearizedstate-observation measure is general and does not rely on any prior information such as relevant spatial scales, empirical cut-off radius or homogeneity assumptions. It automatically segregates the state and observation variables in an optimal number of clusters (otherwise named as subspaces or communities), more amenable to scalable data assimilation.The application of this method does not require underlying block-diagonal structures of prior covariance matrices. In order to deal with inter-cluster connectivity, two alternative data adaptations are proposed. Once the localization is completed, an adaptive covariance diagnosis and tuning is performed within each cluster. Numerical tests show that this approach is less costly and more flexible than a global covariance tuning, and most often results in more accurate background and observations error covariances.
△ Less
Submitted 31 January, 2020;
originally announced January 2020.
-
Background Error Covariance Iterative Updating with Invariant Observation Measures for Data Assimilation
Authors:
Sibo Cheng,
Jean-Philippe Argaud,
Bertrand Iooss,
Didier Lucor,
Angélique Ponçot
Abstract:
In order to leverage the information embedded in the background state and observations, covariance matrices modelling is a pivotal point in data assimilation algorithms. These matrices are often estimated from an ensemble of observations or forecast differences. Nevertheless, for many industrial applications the modelling still remains empirical based on some form of expertise and physical constra…
▽ More
In order to leverage the information embedded in the background state and observations, covariance matrices modelling is a pivotal point in data assimilation algorithms. These matrices are often estimated from an ensemble of observations or forecast differences. Nevertheless, for many industrial applications the modelling still remains empirical based on some form of expertise and physical constraints enforcement in the absence of historical observations or predictions. We have developed two novel robust adaptive assimilation methods named CUTE (Covariance Updating iTerativE) and PUB (Partially Updating BLUE). These two non-parametric methods are based on different optimization objectives, both capable of sequentially adapting background error covariance matrices in order to improve assimilation results under the assumption of a good knowledge of the observation error covariances. We have compared these two methods with the standard approach using a misspecified background matrix in a shallow water twin experiments framework with a linear observation operator. Numerical experiments have shown that the proposed methods bear a real advantage both in terms of posterior error correlation identification and assimilation accuracy.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Optimal Uncertainty Quantification of a risk measurement from a thermal-hydraulic code using Canonical Moments
Authors:
Jerome Stenger,
Fabrice Gamboa,
Merlin Keller,
Bertrand Iooss
Abstract:
We study an industrial computer code related to nuclear safety. A major topic of interest is to assess the uncertainties tainting the results of a computer simulation. In this work we gain robustness on the quantification of a risk measurement by accounting for all sources of uncertainties tainting the inputs of a computer code. To that extent, we evaluate the maximum quantile over a class of dist…
▽ More
We study an industrial computer code related to nuclear safety. A major topic of interest is to assess the uncertainties tainting the results of a computer simulation. In this work we gain robustness on the quantification of a risk measurement by accounting for all sources of uncertainties tainting the inputs of a computer code. To that extent, we evaluate the maximum quantile over a class of distributions defined only by constraints on their moments. Two options are available when dealing with such complex optimization problems: one can either optimize under constraints; or preferably, one should reformulate the objective function. We identify a well suited parameterization to compute the optimal quantile based on the theory of canonical moments. It allows an effective, free of constraints, optimization.
△ Less
Submitted 28 August, 2019; v1 submitted 22 January, 2019;
originally announced January 2019.
-
Advanced methodology for uncertainty propagation in computer experiments with large number of inputs
Authors:
Bertrand Iooss,
Amandine Marrel
Abstract:
In the framework of the estimation of safety margins in nuclear accident analysis, a quantitative assessment of the uncertainties tainting the results of computer simulations is essential. Accurate uncertainty propagation (estimation of high probabilities or quantiles) and quantitative sensitivity analysis may call for several thousand of code simulations. Complex computer codes, as the ones used…
▽ More
In the framework of the estimation of safety margins in nuclear accident analysis, a quantitative assessment of the uncertainties tainting the results of computer simulations is essential. Accurate uncertainty propagation (estimation of high probabilities or quantiles) and quantitative sensitivity analysis may call for several thousand of code simulations. Complex computer codes, as the ones used in thermal-hydraulic accident scenario simulations, are often too cpu-time expensive to be directly used to perform these studies. A solution consists in replacing the computer model by a cpu inexpensive mathematical function, called a metamodel, built from a reduced number of code simulations. However, in case of high dimensional experiments (with typically several tens of inputs), the metamodel building process remains difficult. To face this limitation, we propose a methodology which combines several advanced statistical tools: initial space-filling design, screening to identify the non-influential inputs, Gaussian process (Gp) metamodel building with the group of influential inputs as explanatory variables. The residual effect of the group of non-influential inputs is captured by another Gp metamodel. Then, the resulting joint Gp metamodel is used to accurately estimate Sobol' sensitivity indices and high quantiles (here $95\%$-quantile).The efficiency of the methodology to deal with a large number of inputs and reduce the calculation budget is illustrated on a thermal-hydraulic calculation case simulating with the CATHARE2 code a Loss Of Coolant Accident scenario in a Pressurized Water Reactor. A predictive Gp metamodel is built with only a few hundred of code simulations and allows the calculation of the Sobol' sensitivity indices. This Gp also provides a more accurate estimation of the 95%-quantile and associated confidence interval than the empirical approach, at equal calculation budget. Moreover, on this test case, the joint Gp approach outperforms the simple Gp.
△ Less
Submitted 29 December, 2018;
originally announced December 2018.
-
Optimal Uncertainty Quantification on moment class using canonical moments
Authors:
Jerome Stenger,
Fabrice Gamboa,
Merlin Keller,
Bertrand Iooss
Abstract:
We gain robustness on the quantification of a risk measurement by accounting for all sources of uncertainties tainting the inputs of a computer code. We evaluate the maximum quantile over a class of distributions defined only by constraints on their moments. The methodology is based on the theory of canonical moments that appears to be a well-suited framework for practical optimization.
We gain robustness on the quantification of a risk measurement by accounting for all sources of uncertainties tainting the inputs of a computer code. We evaluate the maximum quantile over a class of distributions defined only by constraints on their moments. The methodology is based on the theory of canonical moments that appears to be a well-suited framework for practical optimization.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
Probabilistic risk bounds for the characterization of radiological contamination
Authors:
Géraud Blatman,
Thibault Delage,
Bertrand Iooss,
Nadia Pérot
Abstract:
The radiological characterization of contaminated elements (walls, grounds, objects) from nuclear facilities often suffers from a too small number of measurements. In order to determine risk prediction bounds on the level of contamination, some classic statistical methods may then reveal unsuited as they rely upon strong assumptions (e.g. that the underlying distribution is Gaussian) which cannot…
▽ More
The radiological characterization of contaminated elements (walls, grounds, objects) from nuclear facilities often suffers from a too small number of measurements. In order to determine risk prediction bounds on the level of contamination, some classic statistical methods may then reveal unsuited as they rely upon strong assumptions (e.g. that the underlying distribution is Gaussian) which cannot be checked. Considering that a set of measurements or their average value arise from a Gaussian distribution can sometimes lead to erroneous conclusion, possibly underconservative. This paper presents several alternative statistical approaches which are based on much weaker hypotheses than Gaussianity. They result from general probabilistic inequalities and order-statistics based formula. Given a data sample, these inequalities make it possible to derive prediction intervals for a random variable, which can be directly interpreted as probabilistic risk bounds. For the sake of validation, they are first applied to synthetic data samples generated from several known theoretical distributions. In a second time, the proposed methods are applied to two data sets obtained from real radiological contamination measurements.
△ Less
Submitted 27 May, 2017; v1 submitted 12 December, 2016;
originally announced January 2017.
-
Poincaré inequalities on intervals -- application to sensitivity analysis
Authors:
Olivier Roustant,
Franck Barthe,
Bertrand Iooss
Abstract:
The development of global sensitivity analysis of numerical model outputs has recently raised new issues on 1-dimensional Poincaré inequalities. Typically two kind of sensitivity indices are linked by a Poincaré type inequality, which provide upper bounds of the most interpretable index by using the other one, cheaper to compute. This allows performing a low-cost screening of unessential variables…
▽ More
The development of global sensitivity analysis of numerical model outputs has recently raised new issues on 1-dimensional Poincaré inequalities. Typically two kind of sensitivity indices are linked by a Poincaré type inequality, which provide upper bounds of the most interpretable index by using the other one, cheaper to compute. This allows performing a low-cost screening of unessential variables. The efficiency of this screening then highly depends on the accuracy of the upper bounds in Poincaré inequalities. The novelty in the questions concern the wide range of probability distributions involved, which are often truncated on intervals. After providing an overview of the existing knowledge and techniques, we add some theory about Poincaré constants on intervals, with improvements for symmetric intervals. Then we exploit the spectral interpretation for computing exact value of Poincaré constants of any admissible distribution on a given interval. We give semi-analytical results for some frequent distributions (truncated exponential, triangular, truncated normal), and present a numerical method in the general case. Finally, an application is made to a hydrological problem, showing the benefits of the new results in Poincaré inequalities to sensitivity analysis.
△ Less
Submitted 12 December, 2016;
originally announced December 2016.
-
Model Assisted Probability of Detection curves: New statistical tools and progressive methodology
Authors:
Loïc Le Gratiet,
Bertrand Iooss,
Géraud Blatman,
Thomas Browne,
Sara Cordeiro,
Benjamin Goursaud
Abstract:
The Probability Of Detection (POD) curve is a standard tool in several industries to evaluate the performance of Non Destructive Testing (NDT) procedures for the detection of harmful defects for the inspected structure. Due to new capabilities of NDT process numerical simulation , Model Assisted Probability of Detection (MAPOD) approaches have also been recently developed. In this paper, a generic…
▽ More
The Probability Of Detection (POD) curve is a standard tool in several industries to evaluate the performance of Non Destructive Testing (NDT) procedures for the detection of harmful defects for the inspected structure. Due to new capabilities of NDT process numerical simulation , Model Assisted Probability of Detection (MAPOD) approaches have also been recently developed. In this paper, a generic and progressive MAPOD methodology is proposed. Limits and assumptions of the classical methods are enlightened, while new metamodel-based methods are proposed. They allow to access to relevant information based on sensitivity analysis of MAPOD inputs. Applications are performed on Eddy Current Non Destructive Examination numerical data.
△ Less
Submitted 22 January, 2016;
originally announced January 2016.
-
Stochastic simulators based optimization by Gaussian process metamodels -- Application to maintenance investments planning issues
Authors:
Thomas Browne,
Bertrand Iooss,
Loïc Le Gratiet,
Jérôme Lonchampt,
Emmanuel Remy
Abstract:
This paper deals with the optimization of industrial asset management strategies, whose profitability is characterized by the Net Present Value (NPV) indicator which is assessed by a Monte Carlo simulator. The developed method consists in building a metamodel of this stochastic simulator, allowing to get, for a given model input, the NPV probability distribution without running the simulator. The…
▽ More
This paper deals with the optimization of industrial asset management strategies, whose profitability is characterized by the Net Present Value (NPV) indicator which is assessed by a Monte Carlo simulator. The developed method consists in building a metamodel of this stochastic simulator, allowing to get, for a given model input, the NPV probability distribution without running the simulator. The present work is concentrated on the emulation of the quantile function of the stochastic simulator by interpolating well chosen basis functions and metamodeling their coefficients (using the Gaussian process metamodel). This quantile function metamodel is then used to treat a problem of strategy maintenance optimization (four systems installed on different plants), in order to optimize an NPV quantile. Using the Gaussian process framework, an adaptive design method (called QFEI) is defined by extending in our case the well known EGO algorithm. This allows to obtain an "optimal" solution using a small number of simulator runs.
△ Less
Submitted 3 May, 2016; v1 submitted 22 December, 2015;
originally announced December 2015.
-
Open TURNS: An industrial software for uncertainty quantification in simulation
Authors:
Michaël Baudin,
Anne Dutfoy,
Bertrand Iooss,
Anne-Laure Popelin
Abstract:
The needs to assess robust performances for complex systems and to answer tighter regulatory processes (security, safety, environmental control, and health impacts, etc.) have led to the emergence of a new industrial simulation challenge: to take uncertainties into account when dealing with complex numerical simulation frameworks.
Therefore, a generic methodology has emerged from the joint effor…
▽ More
The needs to assess robust performances for complex systems and to answer tighter regulatory processes (security, safety, environmental control, and health impacts, etc.) have led to the emergence of a new industrial simulation challenge: to take uncertainties into account when dealing with complex numerical simulation frameworks.
Therefore, a generic methodology has emerged from the joint effort of several industrial companies and academic institutions.
EDF R&D, Airbus Group and Phimeca Engineering started a collaboration at the beginning of 2005, joined by IMACS in 2014, for the development of an Open Source software platform dedicated to uncertainty propagation by probabilistic methods, named OpenTURNS for Open source Treatment of Uncertainty, Risk 'N Statistics.
OpenTURNS addresses the specific industrial challenges attached to uncertainties, which are transparency, genericity, modularity and multi-accessibility.
This paper focuses on OpenTURNS and presents its main features: openTURNS is an open source software under the LGPL license, that presents itself as a C++ library and a Python TUI, and which works under Linux and Windows environment. All the methodological tools are described in the different sections of this paper: uncertainty quantification, uncertainty propagation, sensitivity analysis and metamodeling. A section also explains the generic wrappers way to link openTURNS to any external code.
The paper illustrates as much as possible the methodological tools on an educational example that simulates the height of a river and compares it to the height of a dyke that protects industrial facilities.
At last, it gives an overview of the main developments planned for the next few years.
△ Less
Submitted 5 June, 2015; v1 submitted 21 January, 2015;
originally announced January 2015.
-
A review on global sensitivity analysis methods
Authors:
Bertrand Iooss,
Paul Lemaître
Abstract:
This chapter makes a review, in a complete methodological framework, of various global sensitivity analysis methods of model output. Numerous statistical and probabilistic tools (regression, smoothing, tests, statistical learning, Monte Carlo, \ldots) aim at determining the model input variables which mostly contribute to an interest quantity depending on model output. This quantity can be for ins…
▽ More
This chapter makes a review, in a complete methodological framework, of various global sensitivity analysis methods of model output. Numerous statistical and probabilistic tools (regression, smoothing, tests, statistical learning, Monte Carlo, \ldots) aim at determining the model input variables which mostly contribute to an interest quantity depending on model output. This quantity can be for instance the variance of an output variable. Three kinds of methods are distinguished: the screening (coarse sorting of the most influential inputs among a large number), the measures of importance (quantitative sensitivity indices) and the deep exploration of the model behaviour (measuring the effects of inputs on their all variation range). A progressive application methodology is illustrated on a scholar application. A synthesis is given to place every method according to several axes, mainly the cost in number of model evaluations, the model complexity and the nature of brought information.
△ Less
Submitted 9 April, 2014;
originally announced April 2014.
-
Visualization tools for uncertainty and sensitivity analyses on thermal-hydraulic transients
Authors:
Anne-Laure Popelin,
Bertrand Iooss
Abstract:
In nuclear engineering studies, uncertainty and sensitivity analyses of simulation computer codes can be faced to the complexity of the input and/or the output variables. If these variables represent a transient or a spatial phenomenon, the difficulty is to provide tool adapted to their functional nature. In this paper, we describe useful visualization tools in the context of uncertainty analysis…
▽ More
In nuclear engineering studies, uncertainty and sensitivity analyses of simulation computer codes can be faced to the complexity of the input and/or the output variables. If these variables represent a transient or a spatial phenomenon, the difficulty is to provide tool adapted to their functional nature. In this paper, we describe useful visualization tools in the context of uncertainty analysis of model transient outputs. Our application involves thermal-hydraulic computations for safety studies of nuclear pressurized water reactors.
△ Less
Submitted 28 February, 2014;
originally announced February 2014.
-
Numerical studies of the metamodel fitting and validation processes
Authors:
Bertrand Iooss,
Loïc Boussouf,
Vincent Feuillard,
Amandine Marrel
Abstract:
Complex computer codes, for instance simulating physical phenomena, are often too time expensive to be directly used to perform uncertainty, sensitivity, optimization and robustness analyses. A widely accepted method to circumvent this problem consists in replacing cpu time expensive computer models by cpu inexpensive mathematical functions, called metamodels. In this paper, we focus on the Gaussi…
▽ More
Complex computer codes, for instance simulating physical phenomena, are often too time expensive to be directly used to perform uncertainty, sensitivity, optimization and robustness analyses. A widely accepted method to circumvent this problem consists in replacing cpu time expensive computer models by cpu inexpensive mathematical functions, called metamodels. In this paper, we focus on the Gaussian process metamodel and two essential steps of its definition phase. First, the initial design of the computer code input variables (which allows to fit the metamodel) has to honor adequate space filling properties. We adopt a numerical approach to compare the performance of different types of space filling designs, in the class of the optimal Latin hypercube samples, in terms of the predictivity of the subsequent fitted metamodel. We conclude that such samples with minimal wrap-around discrepancy are particularly well-suited for the Gaussian process metamodel fitting. Second, the metamodel validation process consists in evaluating the metamodel predictivity with respect to the initial computer code. We propose and test an algorithm which optimizes the distance between the validation points and the metamodel learning points in order to estimate the true metamodel predictivity with a minimum number of validation points. Comparisons with classical validation algorithms and application to a nuclear safety computer code show the relevance of this new sequential validation design.
△ Less
Submitted 23 September, 2010; v1 submitted 7 January, 2010;
originally announced January 2010.
-
Global sensitivity analysis for models with spatially dependent outputs
Authors:
Amandine Marrel,
Bertrand Iooss,
Michel Jullien,
Beatrice Laurent,
Elena Volkova
Abstract:
The global sensitivity analysis of a complex numerical model often calls for the estimation of variance-based importance measures, named Sobol' indices. Metamodel-based techniques have been developed in order to replace the cpu time-expensive computer code with an inexpensive mathematical function, which predicts the computer code output. The common metamodel-based sensitivity analysis methods are…
▽ More
The global sensitivity analysis of a complex numerical model often calls for the estimation of variance-based importance measures, named Sobol' indices. Metamodel-based techniques have been developed in order to replace the cpu time-expensive computer code with an inexpensive mathematical function, which predicts the computer code output. The common metamodel-based sensitivity analysis methods are well-suited for computer codes with scalar outputs. However, in the environmental domain, as in many areas of application, the numerical model outputs are often spatial maps, which may also vary with time. In this paper, we introduce an innovative method to obtain a spatial map of Sobol' indices with a minimal number of numerical model computations. It is based upon the functional decomposition of the spatial output onto a wavelet basis and the metamodeling of the wavelet coefficients by the Gaussian process. An analytical example is presented to clarify the various steps of our methodology. This technique is then applied to a real hydrogeological case: for each model input variable, a spatial map of Sobol' indices is thus obtained.
△ Less
Submitted 23 September, 2010; v1 submitted 6 November, 2009;
originally announced November 2009.
-
Latin hypercube sampling with inequality constraints
Authors:
Matthieu Petelet,
Bertrand Iooss,
Olivier Asserin,
Alexandre Loredo
Abstract:
In some studies requiring predictive and CPU-time consuming numerical models, the sampling design of the model input variables has to be chosen with caution. For this purpose, Latin hypercube sampling has a long history and has shown its robustness capabilities. In this paper we propose and discuss a new algorithm to build a Latin hypercube sample (LHS) taking into account inequality constraints b…
▽ More
In some studies requiring predictive and CPU-time consuming numerical models, the sampling design of the model input variables has to be chosen with caution. For this purpose, Latin hypercube sampling has a long history and has shown its robustness capabilities. In this paper we propose and discuss a new algorithm to build a Latin hypercube sample (LHS) taking into account inequality constraints between the sampled variables. This technique, called constrained Latin hypercube sampling (cLHS), consists in doing permutations on an initial LHS to honor the desired monotonic constraints. The relevance of this approach is shown on a real example concerning the numerical welding simulation, where the inequality constraints are caused by the physical decreasing of some material properties in function of the temperature.
△ Less
Submitted 23 September, 2010; v1 submitted 2 September, 2009;
originally announced September 2009.
-
Controlled stratification for quantile estimation
Authors:
Claire Cannamela,
Josselin Garnier,
Bertrand Iooss
Abstract:
In this paper we propose and discuss variance reduction techniques for the estimation of quantiles of the output of a complex model with random input parameters. These techniques are based on the use of a reduced model, such as a metamodel or a response surface. The reduced model can be used as a control variate; or a rejection method can be implemented to sample the realizations of the input pa…
▽ More
In this paper we propose and discuss variance reduction techniques for the estimation of quantiles of the output of a complex model with random input parameters. These techniques are based on the use of a reduced model, such as a metamodel or a response surface. The reduced model can be used as a control variate; or a rejection method can be implemented to sample the realizations of the input parameters in prescribed relevant strata; or the reduced model can be used to determine a good biased distribution of the input parameters for the implementation of an importance sampling strategy. The different strategies are analyzed and the asymptotic variances are computed, which shows the benefit of an adaptive controlled stratification method. This method is finally applied to a real example (computation of the peak cladding temperature during a large-break loss of coolant accident in a nuclear reactor).
△ Less
Submitted 27 January, 2009; v1 submitted 18 February, 2008;
originally announced February 2008.
-
An efficient methodology for modeling complex computer codes with Gaussian processes
Authors:
Amandine Marrel,
Bertrand Iooss,
Francois Van Dorpe,
Elena Volkova
Abstract:
Complex computer codes are often too time expensive to be directly used to perform uncertainty propagation studies, global sensitivity analysis or to solve optimization problems. A well known and widely used method to circumvent this inconvenience consists in replacing the complex computer code by a reduced model, called a metamodel, or a response surface that represents the computer code and re…
▽ More
Complex computer codes are often too time expensive to be directly used to perform uncertainty propagation studies, global sensitivity analysis or to solve optimization problems. A well known and widely used method to circumvent this inconvenience consists in replacing the complex computer code by a reduced model, called a metamodel, or a response surface that represents the computer code and requires acceptable calculation time. One particular class of metamodels is studied: the Gaussian process model that is characterized by its mean and covariance functions. A specific estimation procedure is developed to adjust a Gaussian process model in complex cases (non linear relations, highly dispersed or discontinuous output, high dimensional input, inadequate sampling designs, ...). The efficiency of this algorithm is compared to the efficiency of other existing algorithms on an analytical test case. The proposed methodology is also illustrated for the case of a complex hydrogeological computer code, simulating radionuclide transport in groundwater.
△ Less
Submitted 6 April, 2008; v1 submitted 8 February, 2008;
originally announced February 2008.
-
Global sensitivity analysis of computer models with functional inputs
Authors:
Bertrand Iooss,
Mathieu Ribatet
Abstract:
Global sensitivity analysis is used to quantify the influence of uncertain input parameters on the response variability of a numerical model. The common quantitative methods are applicable to computer codes with scalar input variables. This paper aims to illustrate different variance-based sensitivity analysis techniques, based on the so-called Sobol indices, when some input variables are functi…
▽ More
Global sensitivity analysis is used to quantify the influence of uncertain input parameters on the response variability of a numerical model. The common quantitative methods are applicable to computer codes with scalar input variables. This paper aims to illustrate different variance-based sensitivity analysis techniques, based on the so-called Sobol indices, when some input variables are functional, such as stochastic processes or random spatial fields. In this work, we focus on large cpu time computer codes which need a preliminary meta-modeling step before performing the sensitivity analysis. We propose the use of the joint modeling approach, i.e., modeling simultaneously the mean and the dispersion of the code outputs using two interlinked Generalized Linear Models (GLM) or Generalized Additive Models (GAM). The ``mean'' model allows to estimate the sensitivity indices of each scalar input variables, while the ``dispersion'' model allows to derive the total sensitivity index of the functional input variables. The proposed approach is compared to some classical SA methodologies on an analytical function. Lastly, the proposed methodology is applied to a concrete industrial computer code that simulates the nuclear fuel irradiation.
△ Less
Submitted 9 June, 2008; v1 submitted 7 February, 2008;
originally announced February 2008.
-
Calculations of Sobol indices for the Gaussian process metamodel
Authors:
Amandine Marrel,
Bertrand Iooss,
Beatrice Laurent,
Olivier Roustant
Abstract:
Global sensitivity analysis of complex numerical models can be performed by calculating variance-based importance measures of the input variables, such as the Sobol indices. However, these techniques, requiring a large number of model evaluations, are often unacceptable for time expensive computer codes. A well known and widely used decision consists in replacing the computer code by a metamodel…
▽ More
Global sensitivity analysis of complex numerical models can be performed by calculating variance-based importance measures of the input variables, such as the Sobol indices. However, these techniques, requiring a large number of model evaluations, are often unacceptable for time expensive computer codes. A well known and widely used decision consists in replacing the computer code by a metamodel, predicting the model responses with a negligible computation time and rending straightforward the estimation of Sobol indices. In this paper, we discuss about the Gaussian process model which gives analytical expressions of Sobol indices. Two approaches are studied to compute the Sobol indices: the first based on the predictor of the Gaussian process model and the second based on the global stochastic process model. Comparisons between the two estimates, made on analytical examples, show the superiority of the second approach in terms of convergence and robustness. Moreover, the second approach allows to integrate the modeling error of the Gaussian process model by directly giving some confidence intervals on the Sobol indices. These techniques are finally applied to a real case of hydrogeological modeling.
△ Less
Submitted 7 February, 2008;
originally announced February 2008.
-
Global Sensitivity Analysis of Stochastic Computer Models with joint metamodels
Authors:
Bertrand Iooss,
Mathieu Ribatet,
Amandine Marrel
Abstract:
The global sensitivity analysis method, used to quantify the influence of uncertain input variables on the response variability of a numerical model, is applicable to deterministic computer code (for which the same set of input variables gives always the same output value). This paper proposes a global sensitivity analysis methodology for stochastic computer code (having a variability induced by…
▽ More
The global sensitivity analysis method, used to quantify the influence of uncertain input variables on the response variability of a numerical model, is applicable to deterministic computer code (for which the same set of input variables gives always the same output value). This paper proposes a global sensitivity analysis methodology for stochastic computer code (having a variability induced by some uncontrollable variables). The framework of the joint modeling of the mean and dispersion of heteroscedastic data is used. To deal with the complexity of computer experiment outputs, non parametric joint models (based on Generalized Additive Models and Gaussian processes) are discussed. The relevance of these new models is analyzed in terms of the obtained variance-based sensitivity indices with two case studies. Results show that the joint modeling approach leads accurate sensitivity index estimations even when clear heteroscedasticity is present.
△ Less
Submitted 8 June, 2009; v1 submitted 4 February, 2008;
originally announced February 2008.