-
Importance sampling for Sobol' indices estimation
Authors:
Haythem Boucharif,
Jérôme Morio,
Paul Rochet
Abstract:
We propose a new importance sampling framework for the estimation and analysis of Sobol' indices. We show that a Sobol' index defined under a reference input distribution can be consistently estimated from samples drawn from other sampling distributions by reweighting the estimator appropriately to account for the distribution change. We derive the optimal sampling distribution that minimizes the…
▽ More
We propose a new importance sampling framework for the estimation and analysis of Sobol' indices. We show that a Sobol' index defined under a reference input distribution can be consistently estimated from samples drawn from other sampling distributions by reweighting the estimator appropriately to account for the distribution change. We derive the optimal sampling distribution that minimizes the asymptotic variance and demonstrate its strong impact on estimation accuracy. Beyond variance reduction, the framework supports distributional sensitivity analysis via reverse importance sampling, enabling robust exploration of input distribution uncertainty with negligible additional computational cost.
△ Less
Submitted 8 July, 2025;
originally announced July 2025.
-
Efficient influence functions for Sobol' indices under two designs of experiments
Authors:
Thierry Klein,
Agnès Lagnoux,
Paul Rochet,
Thi Mong Ngoc Nguyen
Abstract:
In this note, we are interested in the asymptotic efficiency of Sobol' indices esti-mators. After recalling the basis of asymptotic efficiency, we compute the efficientinfluence functions for Sobol' indices in two different contexts: the Pick-Freeze andthe given-data settings.
In this note, we are interested in the asymptotic efficiency of Sobol' indices esti-mators. After recalling the basis of asymptotic efficiency, we compute the efficientinfluence functions for Sobol' indices in two different contexts: the Pick-Freeze andthe given-data settings.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Efficiency of the averaged rank-based estimator for first order Sobol index inference
Authors:
Thierry Klein,
Paul Rochet
Abstract:
Among the many estimators of first order Sobol indices that have been proposed in the literature, the so-called rank-based estimator is arguably the simplest to implement. This estimator can be viewed as the empirical auto-correlation of the response variable sample obtained upon reordering the data by increasing values of the inputs. This simple idea can be extended to higher lags of autocorrelat…
▽ More
Among the many estimators of first order Sobol indices that have been proposed in the literature, the so-called rank-based estimator is arguably the simplest to implement. This estimator can be viewed as the empirical auto-correlation of the response variable sample obtained upon reordering the data by increasing values of the inputs. This simple idea can be extended to higher lags of autocorrelation, thus providing several competing estimators of the same parameter. We show that these estimators can be combined in a simple manner to achieve the theoretical variance efficiency bound asymptotically.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Test comparison for Sobol Indices over nested sets of variables
Authors:
Thierry Klein,
Nicolas Peteilh,
Paul Rochet
Abstract:
Sensitivity indices are commonly used to quantify the relative influence of any specific group of input variables on the output of a computer code. One crucial question is then to decide whether a given set of variables has a significant impact on the output. Sobol indices are often used to measure this impact but their estimation can be difficult as they usually require a particular design of exp…
▽ More
Sensitivity indices are commonly used to quantify the relative influence of any specific group of input variables on the output of a computer code. One crucial question is then to decide whether a given set of variables has a significant impact on the output. Sobol indices are often used to measure this impact but their estimation can be difficult as they usually require a particular design of experiment. In this work, we take advantage of the monotonicity of Sobol indices with respect to set inclusion to test the influence of some of the input variables. The method does not rely on a direct estimation of the Sobol indices and can be performed under classical iid sampling designs.
△ Less
Submitted 4 April, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
A coupling of the spectral measures at a vertex
Authors:
Thibault Espinasse,
Paul Rochet
Abstract:
Given the adjacency matrix of an undirected graph, we define a coupling of the spectral measures at the vertices, whose moments count the rooted closed paths in the graph. The resulting joint spectral measure verifies numerous interesting properties that allow to recover minors of analytical functions of the adjacency matrix from its generalized moments. We prove an extension of Obata's Central Li…
▽ More
Given the adjacency matrix of an undirected graph, we define a coupling of the spectral measures at the vertices, whose moments count the rooted closed paths in the graph. The resulting joint spectral measure verifies numerous interesting properties that allow to recover minors of analytical functions of the adjacency matrix from its generalized moments. We prove an extension of Obata's Central Limit Theorem in growing star-graphs to the multivariate case and discuss some combinatorial properties using Viennot's heaps of pieces point of view.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Averaging of density kernel estimators
Authors:
O. Chernova,
F. Lavancier,
P. Rochet
Abstract:
Averaging provides an alternative to bandwidth selection for density kernel estimation. We propose a procedure to combine linearly several kernel estimators of a density obtained from different, possibly data-driven, bandwidths. The method relies on minimizing an easily tractable approximation of the integrated square error of the combination. It provides, at a small computational cost, a final so…
▽ More
Averaging provides an alternative to bandwidth selection for density kernel estimation. We propose a procedure to combine linearly several kernel estimators of a density obtained from different, possibly data-driven, bandwidths. The method relies on minimizing an easily tractable approximation of the integrated square error of the combination. It provides, at a small computational cost, a final solution that improves on the initial estimators in most cases. The average estimator is proved to be asymptotically as efficient as the best possible combination (the oracle), with an error term that decreases faster than the minimax rate obtained with separated learning and validation samples. The performances are tested numerically, with results that compare favorably to other existing procedures in terms of mean integrated square errors.
△ Less
Submitted 4 November, 2019; v1 submitted 26 February, 2018;
originally announced February 2018.
-
An Hopf algebra for counting simple cycles
Authors:
Pierre-Louis Giscard,
Paul Rochet,
Richard Wilson
Abstract:
Simple cycles, also known as self-avoiding polygons, are cycles on graphs which are not allowed to visit any vertex more than once. We present an exact formula for enumerating the simple cycles of any length on any directed graph involving a sum over its induced subgraphs. This result stems from an Hopf algebra, which we construct explicitly, and which provides further means of counting simple cyc…
▽ More
Simple cycles, also known as self-avoiding polygons, are cycles on graphs which are not allowed to visit any vertex more than once. We present an exact formula for enumerating the simple cycles of any length on any directed graph involving a sum over its induced subgraphs. This result stems from an Hopf algebra, which we construct explicitly, and which provides further means of counting simple cycles. Finally, we obtain a more general theorem asserting that any Lie idempotent can be used to enumerate simple cycles.
△ Less
Submitted 27 February, 2017; v1 submitted 4 July, 2016;
originally announced July 2016.
-
A tutorial on estimator averaging in spatial point process models
Authors:
Frédéric Lavancier,
P Rochet
Abstract:
Assume that several competing methods are available to estimate a parameter in a given statistical model. The aim of estimator averaging is to provide a new estimator, built as a linear combination of the initial estimators, that achieves better properties, under the quadratic loss, than each individual initial estimator. This contribution provides an accessible and clear overview of the method, a…
▽ More
Assume that several competing methods are available to estimate a parameter in a given statistical model. The aim of estimator averaging is to provide a new estimator, built as a linear combination of the initial estimators, that achieves better properties, under the quadratic loss, than each individual initial estimator. This contribution provides an accessible and clear overview of the method, and investigates its performances on standard spatial point process models. It is demonstrated that the average estimator clearly improves on standard procedures for the considered models. For each example, the code to implement the method with the R software (which only consists of few lines) is provided.
△ Less
Submitted 7 March, 2017; v1 submitted 4 July, 2016;
originally announced July 2016.
-
The Mean/Max Statistic in Extreme Value Analysis
Authors:
Paul Rochet,
Isabel Serra
Abstract:
Most extreme events in real life can be faithfully modeled as random realizations from a Generalized Pareto distribution, which depends on two parameters: the scale and the shape. In many actual situations, one is mostly concerned with the shape parameter, also called tail index, as it contains the main information on the likelihood of extreme events. In this paper, we show that the mean/max stati…
▽ More
Most extreme events in real life can be faithfully modeled as random realizations from a Generalized Pareto distribution, which depends on two parameters: the scale and the shape. In many actual situations, one is mostly concerned with the shape parameter, also called tail index, as it contains the main information on the likelihood of extreme events. In this paper, we show that the mean/max statistic, that is the empirical mean divided by the maximal value of the sample, constitutes an ideal normalization to study the tail index independently of the scale. This statistic appears naturally when trying to distinguish between uniform and exponential distributions, the two transitional phases of the Generalized Pareto model. We propose a simple methodology based on the mean/max statistic to detect, classify and infer on the tail of the distribution of a sample. Applications to seismic events and detection of saturation in experimental measurements are presented.
△ Less
Submitted 29 June, 2016;
originally announced June 2016.
-
A new class of graphs that satisfies the Chen-Chvátal Conjecture
Authors:
Pierre Aboulker,
Martin Matamala,
Paul Rochet,
Jose Zamora
Abstract:
A well-known combinatorial theorem says that a set of n non-collinear points in the plane determines at least n distinct lines. Chen and Chvátal conjectured that this theorem extends to metric spaces, with an appropriated definition of line. In this work we prove a slightly stronger version of Chen and Chvátal conjecture for a family of graphs containing chordal graphs and distance-hereditary grap…
▽ More
A well-known combinatorial theorem says that a set of n non-collinear points in the plane determines at least n distinct lines. Chen and Chvátal conjectured that this theorem extends to metric spaces, with an appropriated definition of line. In this work we prove a slightly stronger version of Chen and Chvátal conjecture for a family of graphs containing chordal graphs and distance-hereditary graphs.
△ Less
Submitted 20 June, 2016;
originally announced June 2016.
-
Enumerating simple paths from connected induced subgraphs
Authors:
Pierre-Louis Giscard,
Paul Rochet
Abstract:
We present an exact formula for the ordinary generating series of the simple paths between any two vertices of a graph. Our formula involves the adjacency matrix of the connected induced subgraphs and remains valid on weighted and directed graphs. As a particular case, we obtain a relation linking the Hamiltonian paths and cycles of a graph to its dominating connected sets.
We present an exact formula for the ordinary generating series of the simple paths between any two vertices of a graph. Our formula involves the adjacency matrix of the connected induced subgraphs and remains valid on weighted and directed graphs. As a particular case, we obtain a relation linking the Hamiltonian paths and cycles of a graph to its dominating connected sets.
△ Less
Submitted 5 December, 2017; v1 submitted 1 June, 2016;
originally announced June 2016.
-
Reconstructing undirected graphs from eigenspaces
Authors:
Yohann De Castro,
Thibault Espinasse,
Paul Rochet
Abstract:
In this paper, we aim at recovering an undirected weighted graph of $N$ vertices from the knowledge of a perturbed version of the eigenspaces of its adjacency matrix $W$. For instance, this situation arises for stationary signals on graphs or for Markov chains observed at random times. Our approach is based on minimizing a cost function given by the Frobenius norm of the commutator…
▽ More
In this paper, we aim at recovering an undirected weighted graph of $N$ vertices from the knowledge of a perturbed version of the eigenspaces of its adjacency matrix $W$. For instance, this situation arises for stationary signals on graphs or for Markov chains observed at random times. Our approach is based on minimizing a cost function given by the Frobenius norm of the commutator $\mathsf{A} \mathsf{B}-\mathsf{B} \mathsf{A}$ between symmetric matrices $\mathsf{A}$ and $\mathsf{B}$.
In the Erdős-Rényi model with no self-loops, we show that identifiability (i.e., the ability to reconstruct $W$ from the knowledge of its eigenspaces) follows a sharp phase transition on the expected number of edges with threshold function $N\log N/2$.
Given an estimation of the eigenspaces based on a $n$-sample, we provide support selection procedures from theoretical and practical point of views. In particular, when deleting an edge from the active support, our study unveils that our test statistic is the order of $\mathcal O(1/n)$ when we overestimate the true support and lower bounded by a positive constant when the estimated support is smaller than the true support. This feature leads to a powerful practical support estimation procedure. Simulated and real life numerical experiments assert our new methodology.
△ Less
Submitted 15 March, 2017; v1 submitted 26 March, 2016;
originally announced March 2016.
-
Algebraic combinatorics on trace monoids: extending number theory to walks on graphs
Authors:
P. -L Giscard,
P Rochet
Abstract:
Partially commutative monoids provide a powerful tool to study graphs, viewingwalks as words whose letters, the edges of the graph, obey a specific commutation rule. A particularclass of traces emerges from this framework, the hikes, whose alphabet is the set of simple cycleson the graph. We show that hikes characterize undirected graphs uniquely, up to isomorphism, andsatisfy remarkable algebraic…
▽ More
Partially commutative monoids provide a powerful tool to study graphs, viewingwalks as words whose letters, the edges of the graph, obey a specific commutation rule. A particularclass of traces emerges from this framework, the hikes, whose alphabet is the set of simple cycleson the graph. We show that hikes characterize undirected graphs uniquely, up to isomorphism, andsatisfy remarkable algebraic properties such as the existence and uniqueness of a prime factorization.Because of this, the set of hikes partially ordered by divisibility hosts a plethora of relations in directcorrespondence with those found in number theory. Some applications of these results are presented,including a permanantal extension to MacMahon's master theorem and a derivation of the Ihara zetafunction.
△ Less
Submitted 20 October, 2016; v1 submitted 8 January, 2016;
originally announced January 2016.
-
Hypothesis testing for markovian models with random time observations
Authors:
Flavia Barsotti,
Anne Philippe,
Paul Rochet
Abstract:
The aim of this paper is to propose a methodology for testing general hypothesis in a Markovian setting with random sampling. A discrete Markov chain X is observed at random time intervals $τ$ k, assumed to be iid with unknown distribution $μ$. Two test procedures are investigated. The first one is devoted to testing if the transition matrix P of the Markov chain X satisfies specific affine constr…
▽ More
The aim of this paper is to propose a methodology for testing general hypothesis in a Markovian setting with random sampling. A discrete Markov chain X is observed at random time intervals $τ$ k, assumed to be iid with unknown distribution $μ$. Two test procedures are investigated. The first one is devoted to testing if the transition matrix P of the Markov chain X satisfies specific affine constraints, covering a wide range of situations such as symmetry or sparsity. The second procedure is a goodness-of-fit test on the distribution $μ$, which reveals to be consistent under mild assumptions even though the time gaps are not observed. The theoretical results are supported by a Monte Carlo simulation study to show the performance and robustness of the proposed methodologies on specific numerical examples.
△ Less
Submitted 22 May, 2015;
originally announced May 2015.
-
Relations between connected and self-avoiding walks in a digraph
Authors:
Thibault Espinasse,
Paul Rochet
Abstract:
Walks in a directed graph can be given a partially ordered structure that extends to possibly unconnected objects, called hikes. Studying the incidence algebra on this poset reveals unsuspected relations between walks and self-avoiding hikes. These relations are derived by considering truncated versions of the characteristic polynomial of the weighted adjacency matrix, resulting in a collection of…
▽ More
Walks in a directed graph can be given a partially ordered structure that extends to possibly unconnected objects, called hikes. Studying the incidence algebra on this poset reveals unsuspected relations between walks and self-avoiding hikes. These relations are derived by considering truncated versions of the characteristic polynomial of the weighted adjacency matrix, resulting in a collection of matrices whose entries enumerate the self-avoiding hikes of length $\ell$ from one vertex to another.
△ Less
Submitted 21 December, 2015; v1 submitted 21 May, 2015;
originally announced May 2015.
-
Estimating the transition matrix of a Markov chain observed at random times
Authors:
Flavia Barsotti,
Yohann De Castro,
Thibault Espinasse,
Paul Rochet
Abstract:
In this paper we develop a statistical estimation technique to recover the transition kernel $P$ of a Markov chain $X=(X_m)_{m \in \mathbb N}$ in presence of censored data. We consider the situation where only a sub-sequence of $X$ is available and the time gaps between the observations are iid random variables. Under the assumption that neither the time gaps nor their distribution are known, we p…
▽ More
In this paper we develop a statistical estimation technique to recover the transition kernel $P$ of a Markov chain $X=(X_m)_{m \in \mathbb N}$ in presence of censored data. We consider the situation where only a sub-sequence of $X$ is available and the time gaps between the observations are iid random variables. Under the assumption that neither the time gaps nor their distribution are known, we provide an estimation method which applies when some transitions in the initial Markov chain $X$ are known to be unfeasible. A consistent estimator of $P$ is derived in closed form as a solution of a minimization problem. The asymptotic performance of the estimator is then discussed in theory and through numerical simulations.
△ Less
Submitted 2 May, 2014;
originally announced May 2014.
-
A Cramér-Rao inequality for non differentiable models
Authors:
Thibault Espinasse,
Paul Rochet
Abstract:
We compute a variance lower bound for unbiased estimators in specified statistical models. The construction of the bound is related to the original Cramér-Rao bound, although it does not require the differentiability of the model. Moreover, we show our efficiency bound to be always greater than the Cramér-Rao bound in smooth models, thus providing a sharper result.
We compute a variance lower bound for unbiased estimators in specified statistical models. The construction of the bound is related to the original Cramér-Rao bound, although it does not require the differentiability of the model. Moreover, we show our efficiency bound to be always greater than the Cramér-Rao bound in smooth models, thus providing a sharper result.
△ Less
Submitted 12 April, 2012;
originally announced April 2012.
-
Bayesian interpretation of Generalized empirical likelihood by maximum entropy
Authors:
Paul Rochet
Abstract:
We study a parametric estimation problem related to moment condition models. As an alternative to the generalized empirical likelihood (GEL) and the generalized method of moments (GMM), a Bayesian approach to the problem can be adopted, extending the MEM procedure to parametric moment conditions. We show in particular that a large number of GEL estimators can be interpreted as a maximum entropy so…
▽ More
We study a parametric estimation problem related to moment condition models. As an alternative to the generalized empirical likelihood (GEL) and the generalized method of moments (GMM), a Bayesian approach to the problem can be adopted, extending the MEM procedure to parametric moment conditions. We show in particular that a large number of GEL estimators can be interpreted as a maximum entropy solution. Moreover, we provide a more general field of applications by proving the method to be robust to approximate moment conditions.
△ Less
Submitted 29 February, 2012;
originally announced February 2012.
-
A Threshold Regularization Method for Inverse Problems
Authors:
Paul Rochet
Abstract:
A number of regularization methods for discrete inverse problems consist in considering weighted versions of the usual least square solution. However, these so-called filter methods are generally restricted to monotonic transformations, e.g. the Tikhonov regularization or the spectral cut-off. In this paper, we point out that in several cases, non-monotonic sequences of filters are more efficient.…
▽ More
A number of regularization methods for discrete inverse problems consist in considering weighted versions of the usual least square solution. However, these so-called filter methods are generally restricted to monotonic transformations, e.g. the Tikhonov regularization or the spectral cut-off. In this paper, we point out that in several cases, non-monotonic sequences of filters are more efficient. We study a regularization method that naturally extends the spectral cut-off procedure to non-monotonic sequences and provide several oracle inequalities, showing the method to be nearly optimal under mild assumptions. Then, we extend the method to inverse problems with noisy operator and provide efficiency results in a newly introduced conditional framework.
△ Less
Submitted 3 May, 2011;
originally announced May 2011.
-
Semiparametric Efficiency of GMM under Approximate Constraints
Authors:
Paul Rochet
Abstract:
Generalized empirical likelihood and generalized method of moments are well spread methods of resolution of inverse problems in econometrics. Each method defines a specific semiparametric model for which it is possible to calculate efficiency bounds. By this approach, we provide a new proof of Chamberlain's result on optimal GMM. We also discuss conditions under which GMM estimators remain efficie…
▽ More
Generalized empirical likelihood and generalized method of moments are well spread methods of resolution of inverse problems in econometrics. Each method defines a specific semiparametric model for which it is possible to calculate efficiency bounds. By this approach, we provide a new proof of Chamberlain's result on optimal GMM. We also discuss conditions under which GMM estimators remain efficient with approximate moment constraints.
△ Less
Submitted 23 November, 2010; v1 submitted 22 November, 2010;
originally announced November 2010.
-
Maximum Entropy Estimation for Survey sampling
Authors:
Fabrice Gamboa,
Jean-Michel Loubes,
Paul Rochet
Abstract:
Calibration methods have been widely studied in survey sampling over the last decades. Viewing calibration as an inverse problem, we extend the calibration technique by using a maximum entropy method. Finding the optimal weights is achieved by considering random weights and looking for a discrete distribution which maximizes an entropy under the calibration constraint. This method points a new f…
▽ More
Calibration methods have been widely studied in survey sampling over the last decades. Viewing calibration as an inverse problem, we extend the calibration technique by using a maximum entropy method. Finding the optimal weights is achieved by considering random weights and looking for a discrete distribution which maximizes an entropy under the calibration constraint. This method points a new frame for the computation of such estimates and the investigation of its statistical properties.
△ Less
Submitted 22 September, 2009;
originally announced September 2009.
-
Regularization with Approximated $L^2$ Maximum Entropy Method
Authors:
Jean-Michel Loubes,
Paul Rochet
Abstract:
We tackle the inverse problem of reconstructing an unknown finite measure $μ$ from a noisy observation of a generalized moment of $μ$ defined as the integral of a continuous and bounded operator $Φ$ with respect to $μ$. When only a quadratic approximation $Φ_m$ of the operator is known, we introduce the $L^2$ approximate maximum entropy solution as a minimizer of a convex functional subject to a…
▽ More
We tackle the inverse problem of reconstructing an unknown finite measure $μ$ from a noisy observation of a generalized moment of $μ$ defined as the integral of a continuous and bounded operator $Φ$ with respect to $μ$. When only a quadratic approximation $Φ_m$ of the operator is known, we introduce the $L^2$ approximate maximum entropy solution as a minimizer of a convex functional subject to a sequence of convex constraints. Under several assumptions on the convex functional, the convergence of the approximate solution is established and rates of convergence are provided.
△ Less
Submitted 2 June, 2009;
originally announced June 2009.