Search | arXiv e-print repository

Robust and efficient multiple-unit switchback experimentation

Authors: Paul Missault, Lorenzo Masoero, Christian Delbé, Thomas Richardson, Guido Imbens

Abstract: User-randomized A/B testing has emerged as the gold standard for online experimentation. However, when this kind of approach is not feasible due to legal, ethical or practical considerations, experimenters have to consider alternatives like item-randomization. Item-randomization is often met with skepticism due to its poor empirical performance. To fill this gap, in this paper we introduce a novel… ▽ More User-randomized A/B testing has emerged as the gold standard for online experimentation. However, when this kind of approach is not feasible due to legal, ethical or practical considerations, experimenters have to consider alternatives like item-randomization. Item-randomization is often met with skepticism due to its poor empirical performance. To fill this gap, in this paper we introduce a novel and rich class of experimental designs, "Regular Balanced Switchback Designs" (RBSDs). At their core, RBSDs work by randomly changing treatment assignments over both time and items. After establishing the properties of our designs in a potential outcomes framework, characterizing assumptions and conditions under which corresponding estimators are resilient to the presence of carryover effects, we show empirically via both realistic simulations and real e-commerce data that RBSDs systematically outperform standard item-randomized and non-balanced switchback approaches by yielding much more accurate estimates of the causal effects of interest without incurring any additional bias. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2505.19643 [pdf, ps, other]

Online activity prediction via generalized Indian buffet process models

Authors: Mario Beraha, Lorenzo Masoero, Stefano Favaro, Thomas S. Richardson

Abstract: Online A/B experiments generate millions of user-activity records each day, yet experimenters need timely forecasts to guide roll-outs and safeguard user experience. Motivated by the problem of activity prediction for A/B tests at Amazon, we introduce a Bayesian nonparametric model for predicting both first-time and repeat triggers in web experiments. The model is based on the stable beta-scaled p… ▽ More Online A/B experiments generate millions of user-activity records each day, yet experimenters need timely forecasts to guide roll-outs and safeguard user experience. Motivated by the problem of activity prediction for A/B tests at Amazon, we introduce a Bayesian nonparametric model for predicting both first-time and repeat triggers in web experiments. The model is based on the stable beta-scaled process prior, which allows for capturing heavy-tailed behaviour without strict parametric assumptions. All posterior and predictive quantities are available in closed form, allowing for fast inference even on large-scale datasets. Simulation studies and a retrospective analysis of 1,774 production experiments show improved accuracy in forecasting new users and total triggers compared with state-of-the-art competitors, especially when only a few pilot days are observed. The framework enables shorter tests while preserving calibrated uncertainty estimates. Although motivated by Amazon's experimentation platform, the method extends to other applications that require rapid, distribution-free prediction of sparse count processes. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: This paper supersedes the two technical reports by the same authors arXiv:2401.14722 and arXiv:2402.03231

arXiv:2405.18621 [pdf, other]

Multi-Armed Bandits with Network Interference

Authors: Abhineet Agarwal, Anish Agarwal, Lorenzo Masoero, Justin Whitehouse

Abstract: Online experimentation with interference is a common challenge in modern applications such as e-commerce and adaptive clinical trials in medicine. For example, in online marketplaces, the revenue of a good depends on discounts applied to competing goods. Statistical inference with interference is widely studied in the offline setting, but far less is known about how to adaptively assign treatments… ▽ More Online experimentation with interference is a common challenge in modern applications such as e-commerce and adaptive clinical trials in medicine. For example, in online marketplaces, the revenue of a good depends on discounts applied to competing goods. Statistical inference with interference is widely studied in the offline setting, but far less is known about how to adaptively assign treatments to minimize regret. We address this gap by studying a multi-armed bandit (MAB) problem where a learner (e-commerce platform) sequentially assigns one of possible $\mathcal{A}$ actions (discounts) to $N$ units (goods) over $T$ rounds to minimize regret (maximize revenue). Unlike traditional MAB problems, the reward of each unit depends on the treatments assigned to other units, i.e., there is interference across the underlying network of units. With $\mathcal{A}$ actions and $N$ units, minimizing regret is combinatorially difficult since the action space grows as $\mathcal{A}^N$. To overcome this issue, we study a sparse network interference model, where the reward of a unit is only affected by the treatments assigned to $s$ neighboring units. We use tools from discrete Fourier analysis to develop a sparse linear representation of the unit-specific reward $r_n: [\mathcal{A}]^N \rightarrow \mathbb{R} $, and propose simple, linear regression-based algorithms to minimize regret. Importantly, our algorithms achieve provably low regret both when the learner observes the interference neighborhood for all units and when it is unknown. This significantly generalizes other works on this topic which impose strict conditions on the strength of interference on a known network, and also compare regret to a markedly weaker optimal action. Empirically, we corroborate our theoretical findings via numerical simulations. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2403.02154 [pdf, other]

Double trouble: Predicting new variant counts across two heterogeneous populations

Authors: Yunyi Shen, Lorenzo Masoero, Joshua G. Schraiber, Tamara Broderick

Abstract: Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they migh… ▽ More Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they might expect to find in a follow-up study: both the number of new variants shared between the populations and the total across the populations. While many authors have developed prediction methods for the single-population case, we show that these predictions can fare poorly across multiple populations that are heterogeneous. We prove that, surprisingly, a natural extension of a state-of-the-art single-population predictor to multiple populations fails for fundamental reasons. We provide the first predictor for the number of new shared variants and new total variants that can handle heterogeneity in multiple populations. We show that our proposed method works well empirically using real cancer and population genetics data. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.03231 [pdf, other]

Improved prediction of future user activity in online A/B testing

Authors: Lorenzo Masoero, Mario Beraha, Thomas Richardson, Stefano Favaro

Abstract: In online randomized experiments or A/B tests, accurate predictions of participant inclusion rates are of paramount importance. These predictions not only guide experimenters in optimizing the experiment's duration but also enhance the precision of treatment effect estimates. In this paper we present a novel, straightforward, and scalable Bayesian nonparametric approach for predicting the rate at… ▽ More In online randomized experiments or A/B tests, accurate predictions of participant inclusion rates are of paramount importance. These predictions not only guide experimenters in optimizing the experiment's duration but also enhance the precision of treatment effect estimates. In this paper we present a novel, straightforward, and scalable Bayesian nonparametric approach for predicting the rate at which individuals will be exposed to interventions within the realm of online A/B testing. Our approach stands out by offering dual prediction capabilities: it forecasts both the quantity of new customers expected in future time windows and, unlike available alternative methods, the number of times they will be observed. We derive closed-form expressions for the posterior distributions of the quantities needed to form predictions about future user activity, thereby bypassing the need for numerical algorithms such as Markov chain Monte Carlo. After a comprehensive exposition of our model, we test its performance on experiments on real and simulated data, where we show its superior performance with respect to existing alternatives in the literature. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.14722 [pdf, other]

A Nonparametric Bayes Approach to Online Activity Prediction

Authors: Mario Beraha, Lorenzo Masoero, Stefano Favaro, Thomas S. Richardson

Abstract: Accurately predicting the onset of specific activities within defined timeframes holds significant importance in several applied contexts. In particular, accurate prediction of the number of future users that will be exposed to an intervention is an important piece of information for experimenters running online experiments (A/B tests). In this work, we propose a novel approach to predict the numb… ▽ More Accurately predicting the onset of specific activities within defined timeframes holds significant importance in several applied contexts. In particular, accurate prediction of the number of future users that will be exposed to an intervention is an important piece of information for experimenters running online experiments (A/B tests). In this work, we propose a novel approach to predict the number of users that will be active in a given time period, as well as the temporal trajectory needed to attain a desired user participation threshold. We model user activity using a Bayesian nonparametric approach which allows us to capture the underlying heterogeneity in user engagement. We derive closed-form expressions for the number of new users expected in a given period, and a simple Monte Carlo algorithm targeting the posterior distribution of the number of days needed to attain a desired number of users; the latter is important for experimental planning. We illustrate the performance of our approach via several experiments on synthetic and real world data, in which we show that our novel method outperforms existing competitors. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.01264 [pdf, other]

Multiple Randomization Designs: Estimation and Inference with Interference

Authors: Lorenzo Masoero, Suhas Vijaykumar, Thomas Richardson, James McQueen, Ido Rosen, Brian Burdick, Pat Bajari, Guido Imbens

Abstract: Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still dominate practice even in online experimentation. However, such designs are of limited value for answering standard questions in settings, common in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we discuss new… ▽ More Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still dominate practice even in online experimentation. However, such designs are of limited value for answering standard questions in settings, common in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we discuss new experimental designs and corresponding estimands to account for and capture these complex spillovers. We derive the finite-sample properties of tractable estimators for main effects, direct effects, and spillovers, and present associated central limit theorems. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2305.01109 [pdf, other]

Leveraging covariate adjustments at scale in online A/B testing

Authors: Lorenzo Masoero, Doug Hains, James McQueen

Abstract: Companies offering web services routinely run randomized online experiments to estimate the causal impact associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or spec… ▽ More Companies offering web services routinely run randomized online experiments to estimate the causal impact associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or special offer, etc. In these settings, even effects whose sizes are very small can have large downstream impacts. The simple difference in means estimator (Splawa-Neyman et al., 1990) is still the standard estimator of choice for many online A/B testing platforms due to its simplicity. This method, however, can fail to detect small effects, even when the experiment contains thousands or millions of observational units. As a by-product of these experiments, however, large amounts of additional data (covariates) are collected. In this paper, we discuss benefits, costs and risks of allowing experimenters to leverage more complicated estimators that make use of covariates when estimating causal effects of interest. We adapt a recently proposed general-purpose algorithm for the estimation of causal effects with covariates to the setting of online A/B tests. Through this paradigm, we implement several covariate-adjusted causal estimators. We thoroughly evaluate their performance at scale, highlighting benefits and shortcomings of different methods. We show on real experiments how "covariate-adjusted" estimators can (i) lead to more precise quantification of the causal effects of interest and (ii) fix issues related to imbalance across treatment arms - a practical concern often overlooked in the literature. In turn, (iii) these more precise estimates can reduce experimentation time, cutting cost and helping to streamline decision-making processes, allowing for faster adoption of beneficial interventions. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Journal ref: 2023 ACM SIGKDD Workshop on Causal Discovery, Prediction and Decision

arXiv:2202.01910 [pdf, other]

doi 10.1214/22-STS871

Cross-Study Replicability in Cluster Analysis

Authors: Lorenzo Masoero, Emma Thomas, Giovanni Parmigiani, Svitlana Tyekucheva, Lorenzo Trippa

Abstract: In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their abili… ▽ More In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their ability to consistently recover biologically meaningful clusters across several datasets. In this paper, we review existing methods to assess replicability of clustering analyses, and discuss a framework for evaluating cross-study clustering replicability, useful when two or more studies are available. These approaches can be applied to any clustering algorithm and can employ different measures of similarity between partitions to quantify replicability, globally (i.e. for the whole sample) as well as locally (i.e. for individual clusters). Using experiments on synthetic and real gene expression data, we illustrate the utility of replicability metrics to evaluate if the same clusters are identified consistently across a collection of datasets. △ Less

Submitted 9 May, 2023; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: Accepted for publication in Statistical Science

arXiv:2112.13495 [pdf, other]

Multiple Randomization Designs

Authors: Patrick Bajari, Brian Burdick, Guido W. Imbens, Lorenzo Masoero, James McQueen, Thomas Richardson, Ido M. Rosen

Abstract: In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by tre… ▽ More In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by treatment group is an estimate of the average effect of the treatment. However, motivating our study, the setting for modern experiments is often different, with the outcomes and treatment assignments indexed by multiple populations. For example, outcomes may be indexed by buyers and sellers, by content creators and subscribers, by drivers and riders, or by travelers and airlines and travel agents, with treatments potentially varying across these indices. Spillovers or interference can arise from interactions between units across populations. For example, sellers' behavior may depend on buyers' treatment assignment, or vice versa. This can invalidate the simple comparison of means as an estimator for the average effect of the treatment in classical RCTs. We propose new experiment designs for settings in which multiple populations interact. We show how these designs allow us to study questions about interference that cannot be answered by classical randomized experiments. Finally, we develop new statistical methods for analyzing these Multiple Randomization Designs. △ Less

Submitted 26 December, 2021; originally announced December 2021.

Comments: 57 pages, 7 figures

MSC Class: 62B15 (Primary) 91B82; 91B26; 91C20; 91B80; 91C20 (Secondary) ACM Class: J.4; G.3; I.2.6

arXiv:2112.02032 [pdf, other]

Bayesian nonparametric strategies for power maximization in rare variants association studies

Authors: Lorenzo Masoero, Joshua Schraiber, Tamara Broderick

Abstract: Rare variants are hypothesized to be largely responsible for heritability and susceptibility to disease in humans. So rare variants association studies hold promise for understanding disease. Conversely though, the rareness of the variants poses practical challenges; since these variants are present in few individuals, it can be difficult to develop data-collection and statistical methods that eff… ▽ More Rare variants are hypothesized to be largely responsible for heritability and susceptibility to disease in humans. So rare variants association studies hold promise for understanding disease. Conversely though, the rareness of the variants poses practical challenges; since these variants are present in few individuals, it can be difficult to develop data-collection and statistical methods that effectively leverage their sparse information. In this work, we develop a novel Bayesian nonparametric model to capture how design choices in rare variants association studies can impact their usefulness. We then show how to use our model to guide design choices under a fixed experimental budget in practice. In particular, we provide a practical workflow and illustrative experiments on simulated data. △ Less

Submitted 3 December, 2021; originally announced December 2021.

arXiv:2106.15480 [pdf, other]

Scaled process priors for Bayesian nonparametric estimation of the unseen genetic variation

Authors: Federico Camerlenghi, Stefano Favaro, Lorenzo Masoero, Tamara Broderick

Abstract: There is a growing interest in the estimation of the number of unseen features, mostly driven by biological applications. A recent work brought out a peculiar property of the popular completely random measures (CRMs) as prior models in Bayesian nonparametric (BNP) inference for the unseen-features problem: for fixed prior's parameters, they all lead to a Poisson posterior distribution for the numb… ▽ More There is a growing interest in the estimation of the number of unseen features, mostly driven by biological applications. A recent work brought out a peculiar property of the popular completely random measures (CRMs) as prior models in Bayesian nonparametric (BNP) inference for the unseen-features problem: for fixed prior's parameters, they all lead to a Poisson posterior distribution for the number of unseen features, which depends on the sampling information only through the sample size. CRMs are thus not a flexible prior model for the unseen-features problem and, while the Poisson posterior distribution may be appealing for analytical tractability and ease of interpretability, its independence from the sampling information makes the BNP approach a questionable oversimplification, with posterior inferences being completely determined by the estimation of unknown prior's parameters. In this paper, we introduce the stable-Beta scaled process (SB-SP) prior, and we show that it allows to enrich the posterior distribution of the number of unseen features arising under CRM priors, while maintaining its analytical tractability and interpretability. That is, the SB-SP prior leads to a negative Binomial posterior distribution, which depends on the sampling information through the sample size and the number of distinct features, with corresponding estimates being simple, linear in the sampling information and computationally efficient. We apply our BNP approach to synthetic data and to real cancer genomic data, showing that: i) it outperforms the most popular parametric and nonparametric competitors in terms of estimation accuracy; ii) it provides improved coverage for the estimation with respect to a BNP approach under CRM priors. △ Less

Submitted 19 February, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

arXiv:2009.10780 [pdf, other]

doi 10.1214/23-BA1385

Independent finite approximations for Bayesian nonparametric inference

Authors: Tin D. Nguyen, Jonathan Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick

Abstract: Completely random measures (CRMs) and their normalizations (NCRMs) offer flexible models in Bayesian nonparametrics. But their infinite dimensionality presents challenges for inference. Two popular finite approximations are truncated finite approximations (TFAs) and independent finite approximations (IFAs). While the former have been well-studied, IFAs lack similarly general bounds on approximatio… ▽ More Completely random measures (CRMs) and their normalizations (NCRMs) offer flexible models in Bayesian nonparametrics. But their infinite dimensionality presents challenges for inference. Two popular finite approximations are truncated finite approximations (TFAs) and independent finite approximations (IFAs). While the former have been well-studied, IFAs lack similarly general bounds on approximation error, and there has been no systematic comparison between the two options. In the present work, we propose a general recipe to construct practical finite-dimensional approximations for homogeneous CRMs and NCRMs, in the presence or absence of power laws. We call our construction the automated independent finite approximation (AIFA). Relative to TFAs, we show that AIFAs facilitate more straightforward derivations and use of parallel computing in approximate inference. We upper bound the approximation error of AIFAs for a wide class of common CRMs and NCRMs -- and thereby develop guidelines for choosing the approximation level. Our lower bounds in key cases suggest that our upper bounds are tight. We prove that, for worst-case choices of observation likelihoods, TFAs are more efficient than AIFAs. Conversely, we find that in real-data experiments with standard likelihoods, AIFAs and TFAs perform similarly. Moreover, we demonstrate that AIFAs can be used for hyperparameter estimation even when other potential IFA options struggle or do not apply. △ Less

Submitted 5 November, 2023; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: The paper has been accepted for publication in Bayesian Analysis. Currently, it is posted on Bayesian Analysis Advance Publication

arXiv:1912.05516 [pdf, other]

doi 10.1093/biomet/asab012

More for less: Predicting and maximizing genetic variant discovery via Bayesian nonparametrics

Authors: Lorenzo Masoero, Federico Camerlenghi, Stefano Favaro, Tamara Broderick

Abstract: While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, then, scientists face a natural trade-off between quantity and quality; they can spend resources to sequence a greater number of genomes (quantity) or spend resources to sequence genomes with increased accuracy (quality). Our goal is to find the optimal all… ▽ More While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, then, scientists face a natural trade-off between quantity and quality; they can spend resources to sequence a greater number of genomes (quantity) or spend resources to sequence genomes with increased accuracy (quality). Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible, and thus as many new scientific insights as possible. In this paper, we consider the common setting where scientists have already conducted a pilot study to reveal variants in a genome and are contemplating a follow-up study. We introduce a Bayesian nonparametric methodology to predict the number of new variants in the follow-up study based on the pilot study. When experimental conditions are kept constant between the pilot and follow-up, we demonstrate on real data from the gnomAD project that our prediction is more accurate than three recent proposals, and competitive with a more classic proposal. Unlike existing methods, though, our method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for (i) more realistic predictions and (ii) optimal allocation of a fixed budget between quality and quantity. △ Less

Submitted 12 February, 2021; v1 submitted 11 December, 2019; originally announced December 2019.

Showing 1–14 of 14 results for author: Masoero, L