-
Stochastic Reweighted Gradient Descent
Authors:
Ayoub El Hanchi,
David A. Stephens
Abstract:
Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sam…
▽ More
Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sampling instead of control variates. While many such methods have been proposed in the literature, directly proving that they improve the convergence of the resulting optimization algorithm has remained elusive. In this work, we propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient). We analyze the convergence of SRG in the strongly-convex case and show that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD. We pay particular attention to the time and memory overhead of our proposed method, and design a specialized red-black tree allowing its efficient implementation. Finally, we present empirical results to support our findings.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes
Authors:
Ayoub El Hanchi,
David A. Stephens
Abstract:
Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret…
▽ More
Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret were designed. In this work, we build on this framework and propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $\mathcal{O}(T^{2/3})$ and $\mathcal{O}(T^{5/6})$ dynamic regret for SGD and SGLD respectively when run with $\mathcal{O}(1/t)$ step sizes. We achieve this dynamic regret bound by leveraging our knowledge of the dynamics defined by the algorithm, and combining ideas from online learning and variance-reduced stochastic optimization. We validate empirically the performance of our algorithm and identify settings in which it leads to significant improvements.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
The role of exchangeability in causal inference
Authors:
Olli Saarela,
David A. Stephens,
Erica E. M. Moodie
Abstract:
Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and defini…
▽ More
Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and definition of causal contrasts of interest, to the concept of exchangeability. Here we propose a probabilistic between-group exchangeability property as an identifying condition for causal effects, relate it to alternative conditions for unconfounded inferences (commonly stated using potential outcomes) and define causal contrasts in the presence of exchangeability in terms of posterior predictive expectations for further exchangeable units. While our main focus is on a point treatment setting, we also investigate how this reasoning carries over to longitudinal settings.
△ Less
Submitted 15 December, 2022; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Estimating Sparse Networks with Hubs
Authors:
Annaliza McGillivray,
Abbas Khalili,
David A. Stephens
Abstract:
Graphical modelling techniques based on sparse selection have been applied to infer complex networks in many fields, including biology and medicine, engineering, finance, and social sciences. One structural feature of some of the networks in such applications that poses a challenge for statistical inference is the presence of a small number of strongly interconnected nodes in a network which are c…
▽ More
Graphical modelling techniques based on sparse selection have been applied to infer complex networks in many fields, including biology and medicine, engineering, finance, and social sciences. One structural feature of some of the networks in such applications that poses a challenge for statistical inference is the presence of a small number of strongly interconnected nodes in a network which are called hubs. For example, in microbiome research hubs or microbial taxa play a significant role in maintaining stability of the microbial community structure. In this paper, we investigate the problem of estimating sparse networks in which there are a few highly connected hub nodes. Methods based on L1-regularization have been widely used for performing sparse selection in the graphical modelling context. However, while these methods encourage sparsity, they do not take into account structural information of the network. We introduce a new method for estimating networks with hubs that exploits the ability of (inverse) covariance selection methods to include structural information about the underlying network. Our proposed method is a weighted lasso approach with novel row/column sum weights, which we refer to as the hubs weighted graphical lasso. We establish large sample properties of the method when the number of parameters diverges with the sample size, and evaluate its finite sample performance via extensive simulations. We illustrate the method with an application to microbiome data.
△ Less
Submitted 1 March, 2020; v1 submitted 19 April, 2019;
originally announced April 2019.
-
Doubly robust dose-response estimation for continuous treatments via generalized propensity score augmented outcome regression
Authors:
Daniel J. Graham,
Emma J. McCoy,
David A. Stephens
Abstract:
This paper constructs a doubly robust estimator for continuous dose-response estimation. An outcome regression model is augmented with a set of inverse generalized propensity score covariates to correct for potential misspecification bias. From the augmented model we can obtain consistent estimates of mean average potential outcomes for distinct strata of the treatment. A polynomial regression is…
▽ More
This paper constructs a doubly robust estimator for continuous dose-response estimation. An outcome regression model is augmented with a set of inverse generalized propensity score covariates to correct for potential misspecification bias. From the augmented model we can obtain consistent estimates of mean average potential outcomes for distinct strata of the treatment. A polynomial regression is then fitted to these point estimates to derive a Taylor approximation to the continuous dose-response function. The bootstrap is used for variance estimation. Analytical results and simulations show that our approach can provide a good approximation to linear or nonlinear dose-response functions under various sources of misspecification of the outcome regression or propensity score models. Efficiency in finite samples is good relative to minimum variance consistent estimators.
△ Less
Submitted 16 June, 2015;
originally announced June 2015.
-
Variable Selection in Causal Inference Using Penalization
Authors:
Ashkan Ertefaie,
Masoud Asgharian,
David A. Stephens
Abstract:
In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a variable selection method based on a penalized likelihood which considers the response and treatment assignment models simultaneously. The proposed method faci…
▽ More
In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a variable selection method based on a penalized likelihood which considers the response and treatment assignment models simultaneously. The proposed method facilitates confounder selection in high-dimensional settings. We show that under some conditions our method attains the oracle property. The selected variables are used to form a double robust regression estimator of the treatment effect. Simulation results are presented and economic growth data are analyzed.
△ Less
Submitted 4 June, 2014; v1 submitted 5 November, 2013;
originally announced November 2013.