-
Datasets for Online Controlled Experiments
Authors:
C. H. Bryan Liu,
Ângelo Cardoso,
Paul Couturier,
Emma J. McCoy
Abstract:
Online Controlled Experiments (OCE) are the gold standard to measure impact and guide decisions for digital products and services. Despite many methodological advances in this area, the scarcity of public datasets and the lack of a systematic review and categorization hinder its development. We present the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to…
▽ More
Online Controlled Experiments (OCE) are the gold standard to measure impact and guide decisions for digital products and services. Despite many methodological advances in this area, the scarcity of public datasets and the lack of a systematic review and categorization hinder its development. We present the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to support the design and running of experiments with adaptive stopping, an increasingly popular approach to enable quickly deploying improvements or rolling back degrading changes. We release the first such dataset, containing daily checkpoints of decision metrics from multiple, real experiments run on a global e-commerce platform. The dataset design is guided by a broader discussion on data requirements for common statistical tests used in digital experimentation. We demonstrate how to use the dataset in the adaptive stopping scenario using sequential and Bayesian hypothesis tests and learn the relevant parameters for each approach.
△ Less
Submitted 14 January, 2022; v1 submitted 19 November, 2021;
originally announced November 2021.
-
Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs
Authors:
Adrian Rivera Cardoso,
Jacob Abernethy,
He Wang,
Huan Xu
Abstract:
We study the problem of repeated play in a zero-sum game in which the payoff matrix may change, in a possibly adversarial fashion, on each round; we call these Online Matrix Games. Finding the Nash Equilibrium (NE) of a two player zero-sum game is core to many problems in statistics, optimization, and economics, and for a fixed game matrix this can be easily reduced to solving a linear program. Bu…
▽ More
We study the problem of repeated play in a zero-sum game in which the payoff matrix may change, in a possibly adversarial fashion, on each round; we call these Online Matrix Games. Finding the Nash Equilibrium (NE) of a two player zero-sum game is core to many problems in statistics, optimization, and economics, and for a fixed game matrix this can be easily reduced to solving a linear program. But when the payoff matrix evolves over time our goal is to find a sequential algorithm that can compete with, in a certain sense, the NE of the long-term-averaged payoff matrix. We design an algorithm with small NE regret--that is, we ensure that the long-term payoff of both players is close to minimax optimum in hindsight. Our algorithm achieves near-optimal dependence with respect to the number of rounds and depends poly-logarithmically on the number of available actions of the players. Additionally, we show that the naive reduction, where each player simply minimizes its own regret, fails to achieve the stated objective regardless of which algorithm is used. We also consider the so-called bandit setting, where the feedback is significantly limited, and we provide an algorithm with small NE regret using one-point estimates of each payoff matrix.
△ Less
Submitted 17 July, 2019;
originally announced July 2019.
-
Large Scale Markov Decision Processes with Changing Rewards
Authors:
Adrian Rivera Cardoso,
He Wang,
Huan Xu
Abstract:
We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves state-of-the-art regret bound of $O( \sqrt{τ(\ln|S|+\ln|A|)T}\ln(T))$, where $S$ is the state space, $A$ is the action space, $τ$ is the mixing time of the MDP, and $T$ is the number of periods. The algorithm's computational complexity is polynom…
▽ More
We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves state-of-the-art regret bound of $O( \sqrt{τ(\ln|S|+\ln|A|)T}\ln(T))$, where $S$ is the state space, $A$ is the action space, $τ$ is the mixing time of the MDP, and $T$ is the number of periods. The algorithm's computational complexity is polynomial in $|S|$ and $|A|$ per period. We then consider a setting often encountered in practice, where the state space of the MDP is too large to allow for exact solutions. By approximating the state-action occupancy measures with a linear architecture of dimension $d\ll|S|$, we propose a modified algorithm with computational complexity polynomial in $d$. We also prove a regret bound for this modified algorithm, which to the best of our knowledge this is the first $\tilde{O}(\sqrt{T})$ regret bound for large scale MDPs with changing rewards.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
Risk-Averse Stochastic Convex Bandit
Authors:
Adrian Rivera Cardoso,
Huan Xu
Abstract:
Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optim…
▽ More
Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.
-
A Recurrent Neural Network Survival Model: Predicting Web User Return Time
Authors:
Georg L. Grob,
Ângelo Cardoso,
C. H. Bryan Liu,
Duncan A. Little,
Benjamin Paul Chamberlain
Abstract:
The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both…
▽ More
The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Differentially Private Online Submodular Optimization
Authors:
Adrian Rivera Cardoso,
Rachel Cummings
Abstract:
In this paper we develop the first algorithms for online submodular minimization that preserve differential privacy under full information feedback and bandit feedback. A sequence of $T$ submodular functions over a collection of $n$ elements arrive online, and at each timestep the algorithm must choose a subset of $[n]$ before seeing the function. The algorithm incurs a cost equal to the function…
▽ More
In this paper we develop the first algorithms for online submodular minimization that preserve differential privacy under full information feedback and bandit feedback. A sequence of $T$ submodular functions over a collection of $n$ elements arrive online, and at each timestep the algorithm must choose a subset of $[n]$ before seeing the function. The algorithm incurs a cost equal to the function evaluated on the chosen set, and seeks to choose a sequence of sets that achieves low expected regret.
Our first result is in the full information setting, where the algorithm can observe the entire function after making its decision at each timestep. We give an algorithm in this setting that is $ε$-differentially private and achieves expected regret $\tilde{O}\left(\frac{n^{3/2}\sqrt{T}}ε\right)$. This algorithm works by relaxing submodular function to a convex function using the Lovasz extension, and then simulating an algorithm for differentially private online convex optimization.
Our second result is in the bandit setting, where the algorithm can only see the cost incurred by its chosen set, and does not have access to the entire function. This setting is significantly more challenging because the algorithm does not receive enough information to compute the Lovasz extension or its subgradients. Instead, we construct an unbiased estimate using a single-point estimation, and then simulate private online convex optimization using this estimate. Our algorithm using bandit feedback is $ε$-differentially private and achieves expected regret $\tilde{O}\left(\frac{n^{3/2}T^{3/4}}ε\right)$.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Product Characterisation towards Personalisation: Learning Attributes from Unstructured Data to Recommend Fashion Products
Authors:
Ângelo Cardoso,
Fabio Daolio,
Saúl Vargas
Abstract:
In this paper, we describe a solution to tackle a common set of challenges in e-commerce, which arise from the fact that new products are continually being added to the catalogue. The challenges involve properly personalising the customer experience, forecasting demand and planning the product range. We argue that the foundational piece to solve all of these problems is having consistent and detai…
▽ More
In this paper, we describe a solution to tackle a common set of challenges in e-commerce, which arise from the fact that new products are continually being added to the catalogue. The challenges involve properly personalising the customer experience, forecasting demand and planning the product range. We argue that the foundational piece to solve all of these problems is having consistent and detailed information about each product, information that is rarely available or consistent given the multitude of suppliers and types of products. We describe in detail the architecture and methodology implemented at ASOS, one of the world's largest fashion e-commerce retailers, to tackle this problem. We then show how this quantitative understanding of the products can be leveraged to improve recommendations in a hybrid recommender system approach.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
-
Generalising Random Forest Parameter Optimisation to Include Stability and Cost
Authors:
C. H. Bryan Liu,
Benjamin Paul Chamberlain,
Duncan A. Little,
Angelo Cardoso
Abstract:
Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest…
▽ More
Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forests predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics.
△ Less
Submitted 13 July, 2017; v1 submitted 29 June, 2017;
originally announced June 2017.
-
Customer Lifetime Value Prediction Using Embeddings
Authors:
Benjamin Paul Chamberlain,
Angelo Cardoso,
C. H. Bryan Liu,
Roberto Pagliari,
Marc Peter Deisenroth
Abstract:
We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of th…
▽ More
We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shopping experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features.
△ Less
Submitted 6 July, 2017; v1 submitted 7 March, 2017;
originally announced March 2017.