-
Multiple Randomization Designs: Estimation and Inference with Interference
Authors:
Lorenzo Masoero,
Suhas Vijaykumar,
Thomas Richardson,
James McQueen,
Ido Rosen,
Brian Burdick,
Pat Bajari,
Guido Imbens
Abstract:
Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still dominate practice even in online experimentation. However, such designs are of limited value for answering standard questions in settings, common in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we discuss new…
▽ More
Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still dominate practice even in online experimentation. However, such designs are of limited value for answering standard questions in settings, common in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we discuss new experimental designs and corresponding estimands to account for and capture these complex spillovers. We derive the finite-sample properties of tractable estimators for main effects, direct effects, and spillovers, and present associated central limit theorems.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches
Authors:
Yu Liu,
Runzhe Wan,
James McQueen,
Doug Hains,
Jinxiang Gu,
Rui Song
Abstract:
The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of da…
▽ More
The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of data-driven AES selection in for online experimentation services by introducing two solutions. The first employs a three-layer Gaussian Mixture Model considering the heteroskedasticity across experiments, and it seeks to estimate the true expected effect size among positive experiments. The second method, grounded in utility theory, aims to determine the optimal effect size by striking a balance between the experiment's cost and the precision of decision-making. Through comparisons with baseline methods using both simulated and real data, we showcase the superior performance of the proposed approaches.
△ Less
Submitted 17 April, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Estimating the Value of Evidence-Based Decision Making
Authors:
Alberto Abadie,
Anish Agarwal,
Guido Imbens,
Siwei Jia,
James McQueen,
Serguei Stepaniants
Abstract:
Business/policy decisions are often based on evidence from randomized experiments and observational studies. In this article we propose an empirical framework to estimate the value of evidence-based decision making (EBDM) and the return on the investment in statistical precision.
Business/policy decisions are often based on evidence from randomized experiments and observational studies. In this article we propose an empirical framework to estimate the value of evidence-based decision making (EBDM) and the return on the investment in statistical precision.
△ Less
Submitted 9 September, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Leveraging covariate adjustments at scale in online A/B testing
Authors:
Lorenzo Masoero,
Doug Hains,
James McQueen
Abstract:
Companies offering web services routinely run randomized online experiments to estimate the causal impact associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or spec…
▽ More
Companies offering web services routinely run randomized online experiments to estimate the causal impact associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or special offer, etc. In these settings, even effects whose sizes are very small can have large downstream impacts. The simple difference in means estimator (Splawa-Neyman et al., 1990) is still the standard estimator of choice for many online A/B testing platforms due to its simplicity. This method, however, can fail to detect small effects, even when the experiment contains thousands or millions of observational units. As a by-product of these experiments, however, large amounts of additional data (covariates) are collected. In this paper, we discuss benefits, costs and risks of allowing experimenters to leverage more complicated estimators that make use of covariates when estimating causal effects of interest. We adapt a recently proposed general-purpose algorithm for the estimation of causal effects with covariates to the setting of online A/B tests. Through this paradigm, we implement several covariate-adjusted causal estimators. We thoroughly evaluate their performance at scale, highlighting benefits and shortcomings of different methods. We show on real experiments how "covariate-adjusted" estimators can (i) lead to more precise quantification of the causal effects of interest and (ii) fix issues related to imbalance across treatment arms - a practical concern often overlooked in the literature. In turn, (iii) these more precise estimates can reduce experimentation time, cutting cost and helping to streamline decision-making processes, allowing for faster adoption of beneficial interventions.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Multiple Randomization Designs
Authors:
Patrick Bajari,
Brian Burdick,
Guido W. Imbens,
Lorenzo Masoero,
James McQueen,
Thomas Richardson,
Ido M. Rosen
Abstract:
In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by tre…
▽ More
In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by treatment group is an estimate of the average effect of the treatment. However, motivating our study, the setting for modern experiments is often different, with the outcomes and treatment assignments indexed by multiple populations. For example, outcomes may be indexed by buyers and sellers, by content creators and subscribers, by drivers and riders, or by travelers and airlines and travel agents, with treatments potentially varying across these indices. Spillovers or interference can arise from interactions between units across populations. For example, sellers' behavior may depend on buyers' treatment assignment, or vice versa. This can invalidate the simple comparison of means as an estimator for the average effect of the treatment in classical RCTs. We propose new experiment designs for settings in which multiple populations interact. We show how these designs allow us to study questions about interference that cannot be answered by classical randomized experiments. Finally, we develop new statistical methods for analyzing these Multiple Randomization Designs.
△ Less
Submitted 26 December, 2021;
originally announced December 2021.
-
A Bayesian Model for Online Activity Sample Sizes
Authors:
Thomas Richardson,
Yu Liu,
James McQueen,
Doug Hains
Abstract:
In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard…
▽ More
In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.
△ Less
Submitted 26 May, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Empirical Bayes for Large-scale Randomized Experiments: a Spectral Approach
Authors:
F. Richard Guo,
James McQueen,
Thomas S. Richardson
Abstract:
Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist $t$-tests, arguably such analyses are deficient: $p$-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that the treatment effect…
▽ More
Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist $t$-tests, arguably such analyses are deficient: $p$-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that the treatment effects are realized from a "true prior". This requires inferring the prior from previous experiments. Following Robbins, we estimate a family of marginal densities of empirical effects, indexed by the noise scale. We show that this family is characterized by the heat equation. We develop a spectral maximum likelihood estimate based on a Fourier series representation, which can be efficiently computed via convex optimization. In order to select hyperparameters and compare models, we describe two model selection criteria. We demonstrate our method on simulated and real data, and compare posterior inference to that under a Gaussian mixture model of the prior.
△ Less
Submitted 25 March, 2020; v1 submitted 6 February, 2020;
originally announced February 2020.
-
megaman: Manifold Learning with Millions of points
Authors:
James McQueen,
Marina Meila,
Jacob VanderPlas,
Zhongyue Zhang
Abstract:
Manifold Learning is a class of algorithms seeking a low-dimensional non-linear representation of high-dimensional data. Thus manifold learning algorithms are, at least in theory, most applicable to high-dimensional data and sample sizes to enable accurate estimation of the manifold. Despite this, most existing manifold learning implementations are not particularly scalable. Here we present a Pyth…
▽ More
Manifold Learning is a class of algorithms seeking a low-dimensional non-linear representation of high-dimensional data. Thus manifold learning algorithms are, at least in theory, most applicable to high-dimensional data and sample sizes to enable accurate estimation of the manifold. Despite this, most existing manifold learning implementations are not particularly scalable. Here we present a Python package that implements a variety of manifold learning algorithms in a modular and scalable fashion, using fast approximate neighbors searches and fast sparse eigendecompositions. The package incorporates theoretical advances in manifold learning, such as the unbiased Laplacian estimator and the estimation of the embedding distortion by the Riemannian metric method. In benchmarks, even on a single-core desktop computer, our code embeds millions of data points in minutes, and takes just 200 minutes to embed the main sample of galaxy spectra from the Sloan Digital Sky Survey --- consisting of 0.6 million samples in 3750-dimensions --- a task which has not previously been possible.
△ Less
Submitted 8 March, 2016;
originally announced March 2016.