-
Worst-Case Learning under a Multi-fidelity Model
Authors:
Simon Foucart,
Nicolas Hengartner
Abstract:
Inspired by multi-fidelity methods in computer simulations, this article introduces procedures to design surrogates for the input/output relationship of a high-fidelity code. These surrogates should be learned from runs of both the high-fidelity and low-fidelity codes and be accompanied by error guarantees that are deterministic rather than stochastic. For this purpose, the article advocates a fra…
▽ More
Inspired by multi-fidelity methods in computer simulations, this article introduces procedures to design surrogates for the input/output relationship of a high-fidelity code. These surrogates should be learned from runs of both the high-fidelity and low-fidelity codes and be accompanied by error guarantees that are deterministic rather than stochastic. For this purpose, the article advocates a framework tied to a theory focusing on worst-case guarantees, namely Optimal Recovery. The multi-fidelity considerations triggered new theoretical results in three scenarios: the globally optimal estimation of linear functionals, the globally optimal approximation of arbitrary quantities of interest in Hilbert spaces, and their locally optimal approximation, still within Hilbert spaces. The latter scenario boils down to the determination of the Chebyshev center for the intersection of two hyperellipsoids. It is worth noting that the mathematical framework presented here, together with its possible extension, seems to be relevant in several other contexts briefly discussed.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Computationally Efficient and Error Aware Surrogate Construction for Numerical Solutions of Subsurface Flow Through Porous Media
Authors:
Aleksei G. Sorokin,
Aleksandra Pachalieva,
Daniel O'Malley,
James M. Hyman,
Fred J. Hickernell,
Nicolas W. Hengartner
Abstract:
Limiting the injection rate to restrict the pressure below a threshold at a critical location can be an important goal of simulations that model the subsurface pressure between injection and extraction wells. The pressure is approximated by the solution of Darcy's partial differential equation (PDE) for a given permeability field. The subsurface permeability is modeled as a random field since it i…
▽ More
Limiting the injection rate to restrict the pressure below a threshold at a critical location can be an important goal of simulations that model the subsurface pressure between injection and extraction wells. The pressure is approximated by the solution of Darcy's partial differential equation (PDE) for a given permeability field. The subsurface permeability is modeled as a random field since it is known only up to statistical properties. This induces uncertainty in the computed pressure. Solving the PDE for an ensemble of random permeability simulations enables estimating a probability distribution for the pressure at the critical location. These simulations are computationally expensive, and practitioners often need rapid online guidance for real-time pressure management. An ensemble of numerical PDE solutions is used to construct a Gaussian process regression model that can quickly predict the pressure at the critical location as a function of the extraction rate and permeability realization.
Our first novel contribution is to identify a sampling methodology for the random environment and matching kernel technology for which fitting the Gaussian process regression model scales as O(n log n) instead of the typical O(n^3) rate in the number of samples n used to fit the surrogate. The surrogate model allows almost instantaneous predictions for the pressure at the critical location as a function of the extraction rate and permeability realization. Our second contribution is a novel algorithm to calibrate the uncertainty in the surrogate model to the discrepancy between the true pressure solution of Darcy's equation and the numerical solution. Although our method is derived for building a surrogate for the solution of Darcy's equation with a random permeability field, the framework broadly applies to solutions of other PDE with random coefficients.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Iterative Isotonic Regression
Authors:
Arnaud Guyader,
Nick Hengartner,
Nicolas Jégou,
Eric Matzner-Løber
Abstract:
This article introduces a new nonparametric method for estimating a univariate regression function of bounded variation. The method exploits the Jordan decomposition which states that a function of bounded variation can be decomposed as the sum of a non-decreasing function and a non-increasing function. This suggests combining the backfitting algorithm for estimating additive functions with isoton…
▽ More
This article introduces a new nonparametric method for estimating a univariate regression function of bounded variation. The method exploits the Jordan decomposition which states that a function of bounded variation can be decomposed as the sum of a non-decreasing function and a non-increasing function. This suggests combining the backfitting algorithm for estimating additive functions with isotonic regression for estimating monotone functions. The resulting iterative algorithm is called Iterative Isotonic Regression (I.I.R.). The main technical result in this paper is the consistency of the proposed estimator when the number of iterations $k_n$ grows appropriately with the sample size $n$. The proof requires two auxiliary results that are of interest in and by themselves: firstly, we generalize the well-known consistency property of isotonic regression to the framework of a non-monotone regression function, and secondly, we relate the backfitting algorithm to Von Neumann's algorithm in convex analysis.
△ Less
Submitted 18 March, 2013;
originally announced March 2013.
-
The phase transition in inhomogeneous random intersection graphs
Authors:
Milan Bradonjić,
Aric Hagberg,
Nicolas W. Hengartner,
Nathan Lemons,
Allon G. Percus
Abstract:
We analyze the component evolution in inhomogeneous random intersection graphs when the average degree is close to 1. As the average degree increases, the size of the largest component in the random intersection graph goes through a phase transition. We give bounds on the size of the largest components before and after this transition. We also prove that the largest component after the transition…
▽ More
We analyze the component evolution in inhomogeneous random intersection graphs when the average degree is close to 1. As the average degree increases, the size of the largest component in the random intersection graph goes through a phase transition. We give bounds on the size of the largest components before and after this transition. We also prove that the largest component after the transition is unique. These results are similar to the phase transition in Erdős-Rényi random graphs; one notable difference is that the jump in the size of the largest component varies in size depending on the parameters of the random intersection graph.
△ Less
Submitted 30 January, 2013;
originally announced January 2013.
-
Component Evolution in General Random Intersection Graphs
Authors:
Milan Bradonjic,
Aric Hagberg,
Nicolas W. Hengartner,
Allon G. Percus
Abstract:
Random intersection graphs (RIGs) are an important random structure with applications in social networks, epidemic networks, blog readership, and wireless sensor networks. RIGs can be interpreted as a model for large randomly formed non-metric data sets. We analyze the component evolution in general RIGs, and give conditions on existence and uniqueness of the giant component. Our techniques genera…
▽ More
Random intersection graphs (RIGs) are an important random structure with applications in social networks, epidemic networks, blog readership, and wireless sensor networks. RIGs can be interpreted as a model for large randomly formed non-metric data sets. We analyze the component evolution in general RIGs, and give conditions on existence and uniqueness of the giant component. Our techniques generalize existing methods for analysis of component evolution: we analyze survival and extinction properties of a dependent, inhomogeneous Galton-Watson branching process on general RIGs. Our analysis relies on bounding the branching processes and inherits the fundamental concepts of the study of component evolution in Erdős-Rényi graphs. The major challenge comes from the underlying structure of RIGs, which involves its both the set of nodes and the set of attributes, as well as the set of different probabilities among the nodes and attributes.
△ Less
Submitted 29 May, 2010;
originally announced May 2010.
-
Multiplicative Bias Corrected Nonparametric Smoothers
Authors:
Nicolas Hengartner,
Eric Matzner-Løber,
Laurent Rouvière,
Thomas Burr
Abstract:
The paper presents a multiplicative bias reduction estimator for nonparametric regression. The approach consists to apply a multiplicative bias correction to an oversmooth pilot estimator. In Burr et al. [2010], this method has been tested to estimate energy spectra. For such data set, it was observed that the method allows to decrease bias with negligible increase in variance. In this paper, we s…
▽ More
The paper presents a multiplicative bias reduction estimator for nonparametric regression. The approach consists to apply a multiplicative bias correction to an oversmooth pilot estimator. In Burr et al. [2010], this method has been tested to estimate energy spectra. For such data set, it was observed that the method allows to decrease bias with negligible increase in variance. In this paper, we study the asymptotic properties of the resulting estimate and prove that this estimate has zero asymptotic bias and the same asymptotic variance as the local linear estimate. Simulations show that our asymptotic results are available for modest sample sizes.
△ Less
Submitted 28 February, 2011; v1 submitted 2 August, 2009;
originally announced August 2009.
-
Regression with strongly correlated data
Authors:
C. S. Jones,
J. M. Finn,
N. Hengartner
Abstract:
This paper discusses linear regression of strongly correlated data that arises, for example, in magnetohydrodynamic equilibrium reconstructions. We have proved that, generically, the covariance matrix of the estimated regression parameters for fixed sample size goes to zero as the correlations become unity. That is, in this limit the estimated parameters are known with perfect accuracy. Simple e…
▽ More
This paper discusses linear regression of strongly correlated data that arises, for example, in magnetohydrodynamic equilibrium reconstructions. We have proved that, generically, the covariance matrix of the estimated regression parameters for fixed sample size goes to zero as the correlations become unity. That is, in this limit the estimated parameters are known with perfect accuracy. Simple examples are shown to illustrate this effect and the nature of the exceptional cases in which the estimate covariance does not go to zero.
△ Less
Submitted 27 February, 2007;
originally announced February 2007.
-
How to Choose a Champion
Authors:
E. Ben-Naim,
N. W. Hengartner
Abstract:
League competition is investigated using random processes and scaling techniques. In our model, a weak team can upset a strong team with a fixed probability. Teams play an equal number of head-to-head matches and the team with the largest number of wins is declared to be the champion. The total number of games needed for the best team to win the championship with high certainty, T, grows as the…
▽ More
League competition is investigated using random processes and scaling techniques. In our model, a weak team can upset a strong team with a fixed probability. Teams play an equal number of head-to-head matches and the team with the largest number of wins is declared to be the champion. The total number of games needed for the best team to win the championship with high certainty, T, grows as the cube of the number of teams, N, i.e., T ~ N^3. This number can be substantially reduced using preliminary rounds where teams play a small number of games and subsequently, only the top teams advance to the next round. When there are k rounds, the total number of games needed for the best team to emerge as champion, T_k, scales as follows, T_k ~N^(γ_k) with gamma_k=1/[1-(2/3)^(k+1)]. For example, gamma_k=9/5,27/19,81/65 for k=1,2,3. These results suggest an algorithm for how to infer the best team using a schedule that is linear in N. We conclude that league format is an ineffective method of determining the best team, and that sequential elimination from the bottom up is fair and efficient.
△ Less
Submitted 21 December, 2006;
originally announced December 2006.