Search | arXiv e-print repository

Importance sampling for Sobol' indices estimation

Authors: Haythem Boucharif, Jérôme Morio, Paul Rochet

Abstract: We propose a new importance sampling framework for the estimation and analysis of Sobol' indices. We show that a Sobol' index defined under a reference input distribution can be consistently estimated from samples drawn from other sampling distributions by reweighting the estimator appropriately to account for the distribution change. We derive the optimal sampling distribution that minimizes the… ▽ More We propose a new importance sampling framework for the estimation and analysis of Sobol' indices. We show that a Sobol' index defined under a reference input distribution can be consistently estimated from samples drawn from other sampling distributions by reweighting the estimator appropriately to account for the distribution change. We derive the optimal sampling distribution that minimizes the asymptotic variance and demonstrate its strong impact on estimation accuracy. Beyond variance reduction, the framework supports distributional sensitivity analysis via reverse importance sampling, enabling robust exploration of input distribution uncertainty with negligible additional computational cost. △ Less

Submitted 8 July, 2025; originally announced July 2025.

MSC Class: 62-08; 62G20; 62P30; 65C05

arXiv:2407.15468 [pdf, ps, other]

Asymptotic efficiency for Sobol' and Cram{é}r-von Mises indices under two designs of experiments

Authors: Thierry Klein, Agnès Lagnoux, Paul Rochet, Thi Mong Ngoc Nguyen

Abstract: A variety of indices aim to quantify the impact of input variables on a response, typically the output from a complex computer code or black-box model. Most commonly used, the Sobol' index typically measures the influence of some inputs from an explained variance perspective. However, some situations may require a more targeted analysis of some inputs influence. With no prior information, distribu… ▽ More A variety of indices aim to quantify the impact of input variables on a response, typically the output from a complex computer code or black-box model. Most commonly used, the Sobol' index typically measures the influence of some inputs from an explained variance perspective. However, some situations may require a more targeted analysis of some inputs influence. With no prior information, distribution-based measures appear to be appealing. In this purpose, so-called Cram{é}r-von Mises indices (and their generalization) have been proposed in the literature, defined as an excess probability integrated over the output distribution that aim to reflect influence on the whole distribution of the output rather than on the variance solely. Inference of these various indices has remained a challenging topic especially in presence of many inputs. While several Sobol' indices estimators are known to be optimal under regularity conditions, the issue of asymptotic efficiency for Cram{é}r-von Mises indices has been unaddressed in the literature so far. For these indices, we derive in this paper the efficiency bounds and discuss the known methods to achieve such optimal bounds. Two estimation contexts are considered: the so-called Pick-Freeze scheme and the Given-Data setting, for which the estimation is produced from a unique input-output sample. △ Less

Submitted 21 July, 2025; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2306.05842 [pdf, other]

Efficiency of the averaged rank-based estimator for first order Sobol index inference

Authors: Thierry Klein, Paul Rochet

Abstract: Among the many estimators of first order Sobol indices that have been proposed in the literature, the so-called rank-based estimator is arguably the simplest to implement. This estimator can be viewed as the empirical auto-correlation of the response variable sample obtained upon reordering the data by increasing values of the inputs. This simple idea can be extended to higher lags of autocorrelat… ▽ More Among the many estimators of first order Sobol indices that have been proposed in the literature, the so-called rank-based estimator is arguably the simplest to implement. This estimator can be viewed as the empirical auto-correlation of the response variable sample obtained upon reordering the data by increasing values of the inputs. This simple idea can be extended to higher lags of autocorrelation, thus providing several competing estimators of the same parameter. We show that these estimators can be combined in a simple manner to achieve the theoretical variance efficiency bound asymptotically. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2203.16977 [pdf, other]

Test comparison for Sobol Indices over nested sets of variables

Authors: Thierry Klein, Nicolas Peteilh, Paul Rochet

Abstract: Sensitivity indices are commonly used to quantify the relative influence of any specific group of input variables on the output of a computer code. One crucial question is then to decide whether a given set of variables has a significant impact on the output. Sobol indices are often used to measure this impact but their estimation can be difficult as they usually require a particular design of exp… ▽ More Sensitivity indices are commonly used to quantify the relative influence of any specific group of input variables on the output of a computer code. One crucial question is then to decide whether a given set of variables has a significant impact on the output. Sobol indices are often used to measure this impact but their estimation can be difficult as they usually require a particular design of experiment. In this work, we take advantage of the monotonicity of Sobol indices with respect to set inclusion to test the influence of some of the input variables. The method does not rely on a direct estimation of the Sobol indices and can be performed under classical iid sampling designs. △ Less

Submitted 4 April, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

arXiv:1904.10720 [pdf, ps, other]

A coupling of the spectral measures at a vertex

Authors: Thibault Espinasse, Paul Rochet

Abstract: Given the adjacency matrix of an undirected graph, we define a coupling of the spectral measures at the vertices, whose moments count the rooted closed paths in the graph. The resulting joint spectral measure verifies numerous interesting properties that allow to recover minors of analytical functions of the adjacency matrix from its generalized moments. We prove an extension of Obata's Central Li… ▽ More Given the adjacency matrix of an undirected graph, we define a coupling of the spectral measures at the vertices, whose moments count the rooted closed paths in the graph. The resulting joint spectral measure verifies numerous interesting properties that allow to recover minors of analytical functions of the adjacency matrix from its generalized moments. We prove an extension of Obata's Central Limit Theorem in growing star-graphs to the multivariate case and discuss some combinatorial properties using Viennot's heaps of pieces point of view. △ Less

Submitted 24 April, 2019; originally announced April 2019.

arXiv:1802.09370 [pdf, other]

Averaging of density kernel estimators

Authors: O. Chernova, F. Lavancier, P. Rochet

Abstract: Averaging provides an alternative to bandwidth selection for density kernel estimation. We propose a procedure to combine linearly several kernel estimators of a density obtained from different, possibly data-driven, bandwidths. The method relies on minimizing an easily tractable approximation of the integrated square error of the combination. It provides, at a small computational cost, a final so… ▽ More Averaging provides an alternative to bandwidth selection for density kernel estimation. We propose a procedure to combine linearly several kernel estimators of a density obtained from different, possibly data-driven, bandwidths. The method relies on minimizing an easily tractable approximation of the integrated square error of the combination. It provides, at a small computational cost, a final solution that improves on the initial estimators in most cases. The average estimator is proved to be asymptotically as efficient as the best possible combination (the oracle), with an error term that decreases faster than the minimax rate obtained with separated learning and validation samples. The performances are tested numerically, with results that compare favorably to other existing procedures in terms of mean integrated square errors. △ Less

Submitted 4 November, 2019; v1 submitted 26 February, 2018; originally announced February 2018.

arXiv:1607.00902 [pdf, other]

doi 10.1016/j.disc.2017.10.002

An Hopf algebra for counting simple cycles

Authors: Pierre-Louis Giscard, Paul Rochet, Richard Wilson

Abstract: Simple cycles, also known as self-avoiding polygons, are cycles on graphs which are not allowed to visit any vertex more than once. We present an exact formula for enumerating the simple cycles of any length on any directed graph involving a sum over its induced subgraphs. This result stems from an Hopf algebra, which we construct explicitly, and which provides further means of counting simple cyc… ▽ More Simple cycles, also known as self-avoiding polygons, are cycles on graphs which are not allowed to visit any vertex more than once. We present an exact formula for enumerating the simple cycles of any length on any directed graph involving a sum over its induced subgraphs. This result stems from an Hopf algebra, which we construct explicitly, and which provides further means of counting simple cycles. Finally, we obtain a more general theorem asserting that any Lie idempotent can be used to enumerate simple cycles. △ Less

Submitted 27 February, 2017; v1 submitted 4 July, 2016; originally announced July 2016.

Journal ref: Journal of Discrete Mathematics 2017

arXiv:1607.00864 [pdf, other]

A tutorial on estimator averaging in spatial point process models

Authors: Frédéric Lavancier, P Rochet

Abstract: Assume that several competing methods are available to estimate a parameter in a given statistical model. The aim of estimator averaging is to provide a new estimator, built as a linear combination of the initial estimators, that achieves better properties, under the quadratic loss, than each individual initial estimator. This contribution provides an accessible and clear overview of the method, a… ▽ More Assume that several competing methods are available to estimate a parameter in a given statistical model. The aim of estimator averaging is to provide a new estimator, built as a linear combination of the initial estimators, that achieves better properties, under the quadratic loss, than each individual initial estimator. This contribution provides an accessible and clear overview of the method, and investigates its performances on standard spatial point process models. It is demonstrated that the average estimator clearly improves on standard procedures for the considered models. For each example, the code to implement the method with the R software (which only consists of few lines) is provided. △ Less

Submitted 7 March, 2017; v1 submitted 4 July, 2016; originally announced July 2016.

arXiv:1606.08974 [pdf, other]

The Mean/Max Statistic in Extreme Value Analysis

Authors: Paul Rochet, Isabel Serra

Abstract: Most extreme events in real life can be faithfully modeled as random realizations from a Generalized Pareto distribution, which depends on two parameters: the scale and the shape. In many actual situations, one is mostly concerned with the shape parameter, also called tail index, as it contains the main information on the likelihood of extreme events. In this paper, we show that the mean/max stati… ▽ More Most extreme events in real life can be faithfully modeled as random realizations from a Generalized Pareto distribution, which depends on two parameters: the scale and the shape. In many actual situations, one is mostly concerned with the shape parameter, also called tail index, as it contains the main information on the likelihood of extreme events. In this paper, we show that the mean/max statistic, that is the empirical mean divided by the maximal value of the sample, constitutes an ideal normalization to study the tail index independently of the scale. This statistic appears naturally when trying to distinguish between uniform and exponential distributions, the two transitional phases of the Generalized Pareto model. We propose a simple methodology based on the mean/max statistic to detect, classify and infer on the tail of the distribution of a sample. Applications to seismic events and detection of saturation in experimental measurements are presented. △ Less

Submitted 29 June, 2016; originally announced June 2016.

arXiv:1606.06011 [pdf, other]

A new class of graphs that satisfies the Chen-Chvátal Conjecture

Authors: Pierre Aboulker, Martin Matamala, Paul Rochet, Jose Zamora

Abstract: A well-known combinatorial theorem says that a set of n non-collinear points in the plane determines at least n distinct lines. Chen and Chvátal conjectured that this theorem extends to metric spaces, with an appropriated definition of line. In this work we prove a slightly stronger version of Chen and Chvátal conjecture for a family of graphs containing chordal graphs and distance-hereditary grap… ▽ More A well-known combinatorial theorem says that a set of n non-collinear points in the plane determines at least n distinct lines. Chen and Chvátal conjectured that this theorem extends to metric spaces, with an appropriated definition of line. In this work we prove a slightly stronger version of Chen and Chvátal conjecture for a family of graphs containing chordal graphs and distance-hereditary graphs. △ Less

Submitted 20 June, 2016; originally announced June 2016.

arXiv:1606.00289 [pdf, ps, other]

doi 10.1007/s00373-018-1966-9

Enumerating simple paths from connected induced subgraphs

Authors: Pierre-Louis Giscard, Paul Rochet

Abstract: We present an exact formula for the ordinary generating series of the simple paths between any two vertices of a graph. Our formula involves the adjacency matrix of the connected induced subgraphs and remains valid on weighted and directed graphs. As a particular case, we obtain a relation linking the Hamiltonian paths and cycles of a graph to its dominating connected sets. We present an exact formula for the ordinary generating series of the simple paths between any two vertices of a graph. Our formula involves the adjacency matrix of the connected induced subgraphs and remains valid on weighted and directed graphs. As a particular case, we obtain a relation linking the Hamiltonian paths and cycles of a graph to its dominating connected sets. △ Less

Submitted 5 December, 2017; v1 submitted 1 June, 2016; originally announced June 2016.

arXiv:1603.08113 [pdf, other]

Reconstructing undirected graphs from eigenspaces

Authors: Yohann De Castro, Thibault Espinasse, Paul Rochet

Abstract: In this paper, we aim at recovering an undirected weighted graph of $N$ vertices from the knowledge of a perturbed version of the eigenspaces of its adjacency matrix $W$. For instance, this situation arises for stationary signals on graphs or for Markov chains observed at random times. Our approach is based on minimizing a cost function given by the Frobenius norm of the commutator… ▽ More In this paper, we aim at recovering an undirected weighted graph of $N$ vertices from the knowledge of a perturbed version of the eigenspaces of its adjacency matrix $W$. For instance, this situation arises for stationary signals on graphs or for Markov chains observed at random times. Our approach is based on minimizing a cost function given by the Frobenius norm of the commutator $\mathsf{A} \mathsf{B}-\mathsf{B} \mathsf{A}$ between symmetric matrices $\mathsf{A}$ and $\mathsf{B}$. In the Erdős-Rényi model with no self-loops, we show that identifiability (i.e., the ability to reconstruct $W$ from the knowledge of its eigenspaces) follows a sharp phase transition on the expected number of edges with threshold function $N\log N/2$. Given an estimation of the eigenspaces based on a $n$-sample, we provide support selection procedures from theoretical and practical point of views. In particular, when deleting an edge from the active support, our study unveils that our test statistic is the order of $\mathcal O(1/n)$ when we overestimate the true support and lower bounded by a positive constant when the estimated support is smaller than the true support. This feature leads to a powerful practical support estimation procedure. Simulated and real life numerical experiments assert our new methodology. △ Less

Submitted 15 March, 2017; v1 submitted 26 March, 2016; originally announced March 2016.

Comments: 25 pages, some figures. Final version

arXiv:1601.01780 [pdf, other]

doi 10.1137/15M1054535

Algebraic combinatorics on trace monoids: extending number theory to walks on graphs

Authors: P. -L Giscard, P Rochet

Abstract: Partially commutative monoids provide a powerful tool to study graphs, viewingwalks as words whose letters, the edges of the graph, obey a specific commutation rule. A particularclass of traces emerges from this framework, the hikes, whose alphabet is the set of simple cycleson the graph. We show that hikes characterize undirected graphs uniquely, up to isomorphism, andsatisfy remarkable algebraic… ▽ More Partially commutative monoids provide a powerful tool to study graphs, viewingwalks as words whose letters, the edges of the graph, obey a specific commutation rule. A particularclass of traces emerges from this framework, the hikes, whose alphabet is the set of simple cycleson the graph. We show that hikes characterize undirected graphs uniquely, up to isomorphism, andsatisfy remarkable algebraic properties such as the existence and uniqueness of a prime factorization.Because of this, the set of hikes partially ordered by divisibility hosts a plethora of relations in directcorrespondence with those found in number theory. Some applications of these results are presented,including a permanantal extension to MacMahon's master theorem and a derivation of the Ihara zetafunction. △ Less

Submitted 20 October, 2016; v1 submitted 8 January, 2016; originally announced January 2016.

Journal ref: SIAM Journal on Discrete Mathematics 31-2, pp. 1428-1453 (2017)

arXiv:1505.06101 [pdf, other]

Hypothesis testing for markovian models with random time observations

Authors: Flavia Barsotti, Anne Philippe, Paul Rochet

Abstract: The aim of this paper is to propose a methodology for testing general hypothesis in a Markovian setting with random sampling. A discrete Markov chain X is observed at random time intervals $τ$ k, assumed to be iid with unknown distribution $μ$. Two test procedures are investigated. The first one is devoted to testing if the transition matrix P of the Markov chain X satisfies specific affine constr… ▽ More The aim of this paper is to propose a methodology for testing general hypothesis in a Markovian setting with random sampling. A discrete Markov chain X is observed at random time intervals $τ$ k, assumed to be iid with unknown distribution $μ$. Two test procedures are investigated. The first one is devoted to testing if the transition matrix P of the Markov chain X satisfies specific affine constraints, covering a wide range of situations such as symmetry or sparsity. The second procedure is a goodness-of-fit test on the distribution $μ$, which reveals to be consistent under mild assumptions even though the time gaps are not observed. The theoretical results are supported by a Monte Carlo simulation study to show the performance and robustness of the proposed methodologies on specific numerical examples. △ Less

Submitted 22 May, 2015; originally announced May 2015.

arXiv:1505.05725 [pdf, other]

Relations between connected and self-avoiding walks in a digraph

Authors: Thibault Espinasse, Paul Rochet

Abstract: Walks in a directed graph can be given a partially ordered structure that extends to possibly unconnected objects, called hikes. Studying the incidence algebra on this poset reveals unsuspected relations between walks and self-avoiding hikes. These relations are derived by considering truncated versions of the characteristic polynomial of the weighted adjacency matrix, resulting in a collection of… ▽ More Walks in a directed graph can be given a partially ordered structure that extends to possibly unconnected objects, called hikes. Studying the incidence algebra on this poset reveals unsuspected relations between walks and self-avoiding hikes. These relations are derived by considering truncated versions of the characteristic polynomial of the weighted adjacency matrix, resulting in a collection of matrices whose entries enumerate the self-avoiding hikes of length $\ell$ from one vertex to another. △ Less

Submitted 21 December, 2015; v1 submitted 21 May, 2015; originally announced May 2015.

arXiv:1405.0384 [pdf, ps, other]

Estimating the transition matrix of a Markov chain observed at random times

Authors: Flavia Barsotti, Yohann De Castro, Thibault Espinasse, Paul Rochet

Abstract: In this paper we develop a statistical estimation technique to recover the transition kernel $P$ of a Markov chain $X=(X_m)_{m \in \mathbb N}$ in presence of censored data. We consider the situation where only a sub-sequence of $X$ is available and the time gaps between the observations are iid random variables. Under the assumption that neither the time gaps nor their distribution are known, we p… ▽ More In this paper we develop a statistical estimation technique to recover the transition kernel $P$ of a Markov chain $X=(X_m)_{m \in \mathbb N}$ in presence of censored data. We consider the situation where only a sub-sequence of $X$ is available and the time gaps between the observations are iid random variables. Under the assumption that neither the time gaps nor their distribution are known, we provide an estimation method which applies when some transitions in the initial Markov chain $X$ are known to be unfeasible. A consistent estimator of $P$ is derived in closed form as a solution of a minimization problem. The asymptotic performance of the estimator is then discussed in theory and through numerical simulations. △ Less

Submitted 2 May, 2014; originally announced May 2014.

arXiv:1204.2763 [pdf, other]

A Cramér-Rao inequality for non differentiable models

Authors: Thibault Espinasse, Paul Rochet

Abstract: We compute a variance lower bound for unbiased estimators in specified statistical models. The construction of the bound is related to the original Cramér-Rao bound, although it does not require the differentiability of the model. Moreover, we show our efficiency bound to be always greater than the Cramér-Rao bound in smooth models, thus providing a sharper result. We compute a variance lower bound for unbiased estimators in specified statistical models. The construction of the bound is related to the original Cramér-Rao bound, although it does not require the differentiability of the model. Moreover, we show our efficiency bound to be always greater than the Cramér-Rao bound in smooth models, thus providing a sharper result. △ Less

Submitted 12 April, 2012; originally announced April 2012.

arXiv:1202.6469 [pdf, ps, other]

Bayesian interpretation of Generalized empirical likelihood by maximum entropy

Authors: Paul Rochet

Abstract: We study a parametric estimation problem related to moment condition models. As an alternative to the generalized empirical likelihood (GEL) and the generalized method of moments (GMM), a Bayesian approach to the problem can be adopted, extending the MEM procedure to parametric moment conditions. We show in particular that a large number of GEL estimators can be interpreted as a maximum entropy so… ▽ More We study a parametric estimation problem related to moment condition models. As an alternative to the generalized empirical likelihood (GEL) and the generalized method of moments (GMM), a Bayesian approach to the problem can be adopted, extending the MEM procedure to parametric moment conditions. We show in particular that a large number of GEL estimators can be interpreted as a maximum entropy solution. Moreover, we provide a more general field of applications by proving the method to be robust to approximate moment conditions. △ Less

Submitted 29 February, 2012; originally announced February 2012.

arXiv:1105.0490 [pdf, ps, other]

A Threshold Regularization Method for Inverse Problems

Authors: Paul Rochet

Abstract: A number of regularization methods for discrete inverse problems consist in considering weighted versions of the usual least square solution. However, these so-called filter methods are generally restricted to monotonic transformations, e.g. the Tikhonov regularization or the spectral cut-off. In this paper, we point out that in several cases, non-monotonic sequences of filters are more efficient.… ▽ More A number of regularization methods for discrete inverse problems consist in considering weighted versions of the usual least square solution. However, these so-called filter methods are generally restricted to monotonic transformations, e.g. the Tikhonov regularization or the spectral cut-off. In this paper, we point out that in several cases, non-monotonic sequences of filters are more efficient. We study a regularization method that naturally extends the spectral cut-off procedure to non-monotonic sequences and provide several oracle inequalities, showing the method to be nearly optimal under mild assumptions. Then, we extend the method to inverse problems with noisy operator and provide efficiency results in a newly introduced conditional framework. △ Less

Submitted 3 May, 2011; originally announced May 2011.

arXiv:1011.4881 [pdf, ps, other]

Semiparametric Efficiency of GMM under Approximate Constraints

Authors: Paul Rochet

Abstract: Generalized empirical likelihood and generalized method of moments are well spread methods of resolution of inverse problems in econometrics. Each method defines a specific semiparametric model for which it is possible to calculate efficiency bounds. By this approach, we provide a new proof of Chamberlain's result on optimal GMM. We also discuss conditions under which GMM estimators remain efficie… ▽ More Generalized empirical likelihood and generalized method of moments are well spread methods of resolution of inverse problems in econometrics. Each method defines a specific semiparametric model for which it is possible to calculate efficiency bounds. By this approach, we provide a new proof of Chamberlain's result on optimal GMM. We also discuss conditions under which GMM estimators remain efficient with approximate moment constraints. △ Less

Submitted 23 November, 2010; v1 submitted 22 November, 2010; originally announced November 2010.

arXiv:0909.4046 [pdf, ps, other]

Maximum Entropy Estimation for Survey sampling

Authors: Fabrice Gamboa, Jean-Michel Loubes, Paul Rochet

Abstract: Calibration methods have been widely studied in survey sampling over the last decades. Viewing calibration as an inverse problem, we extend the calibration technique by using a maximum entropy method. Finding the optimal weights is achieved by considering random weights and looking for a discrete distribution which maximizes an entropy under the calibration constraint. This method points a new f… ▽ More Calibration methods have been widely studied in survey sampling over the last decades. Viewing calibration as an inverse problem, we extend the calibration technique by using a maximum entropy method. Finding the optimal weights is achieved by considering random weights and looking for a discrete distribution which maximizes an entropy under the calibration constraint. This method points a new frame for the computation of such estimates and the investigation of its statistical properties. △ Less

Submitted 22 September, 2009; originally announced September 2009.

Comments: 25 pages

arXiv:0906.0562 [pdf, ps, other]

Regularization with Approximated $L^2$ Maximum Entropy Method

Authors: Jean-Michel Loubes, Paul Rochet

Abstract: We tackle the inverse problem of reconstructing an unknown finite measure $μ$ from a noisy observation of a generalized moment of $μ$ defined as the integral of a continuous and bounded operator $Φ$ with respect to $μ$. When only a quadratic approximation $Φ_m$ of the operator is known, we introduce the $L^2$ approximate maximum entropy solution as a minimizer of a convex functional subject to a… ▽ More We tackle the inverse problem of reconstructing an unknown finite measure $μ$ from a noisy observation of a generalized moment of $μ$ defined as the integral of a continuous and bounded operator $Φ$ with respect to $μ$. When only a quadratic approximation $Φ_m$ of the operator is known, we introduce the $L^2$ approximate maximum entropy solution as a minimizer of a convex functional subject to a sequence of convex constraints. Under several assumptions on the convex functional, the convergence of the approximate solution is established and rates of convergence are provided. △ Less

Submitted 2 June, 2009; originally announced June 2009.

Comments: 16 pages

Showing 1–22 of 22 results for author: Rochet, P