Search | arXiv e-print repository

doi 10.25560/110307

Some Statistical and Data Challenges When Building Early-Stage Digital Experimentation and Measurement Capabilities

Abstract: Digital experimentation and measurement (DEM) capabilities -- the knowledge and tools necessary to run experiments with digital products, services, or experiences and measure their impact -- are fast becoming part of the standard toolkit of digital/data-driven organisations in guiding business decisions. Many large technology companies report having mature DEM capabilities, and several businesses… ▽ More Digital experimentation and measurement (DEM) capabilities -- the knowledge and tools necessary to run experiments with digital products, services, or experiences and measure their impact -- are fast becoming part of the standard toolkit of digital/data-driven organisations in guiding business decisions. Many large technology companies report having mature DEM capabilities, and several businesses have been established purely to manage experiments for others. Given the growing evidence that data-driven organisations tend to outperform their non-data-driven counterparts, there has never been a greater need for organisations to build/acquire DEM capabilities to thrive in the current digital era. This thesis presents several novel approaches to statistical and data challenges for organisations building DEM capabilities. We focus on the fundamentals associated with building DEM capabilities, which lead to a richer understanding of the underlying assumptions and thus enable us to develop more appropriate capabilities. We address why one should engage in DEM by quantifying the benefits and risks of acquiring DEM capabilities. This is done using a ranking under lower uncertainty model, enabling one to construct a business case. We also examine what ingredients are necessary to run digital experiments. In addition to clarifying the existing literature around statistical tests, datasets, and methods in experimental design and causal inference, we construct an additional dataset and detailed case studies on applying state-of-the-art methods. Finally, we investigate when a digital experiment design would outperform another, leading to an evaluation framework that compares competing designs' data efficiency. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: PhD thesis. Imperial College London. Official library version available on: https://spiral.imperial.ac.uk/handle/10044/1/110307

arXiv:2111.10198 [pdf, other]

Datasets for Online Controlled Experiments

Authors: C. H. Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy

Abstract: Online Controlled Experiments (OCE) are the gold standard to measure impact and guide decisions for digital products and services. Despite many methodological advances in this area, the scarcity of public datasets and the lack of a systematic review and categorization hinder its development. We present the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to… ▽ More Online Controlled Experiments (OCE) are the gold standard to measure impact and guide decisions for digital products and services. Despite many methodological advances in this area, the scarcity of public datasets and the lack of a systematic review and categorization hinder its development. We present the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to support the design and running of experiments with adaptive stopping, an increasingly popular approach to enable quickly deploying improvements or rolling back degrading changes. We release the first such dataset, containing daily checkpoints of decision metrics from multiple, real experiments run on a global e-commerce platform. The dataset design is guided by a broader discussion on data requirements for common statistical tests used in digital experimentation. We demonstrate how to use the dataset in the adaptive stopping scenario using sequential and Bayesian hypothesis tests and learn the relevant parameters for each approach. △ Less

Submitted 14 January, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. 17 pages, 2 figures, 2 tables. Dataset available on Open Science Framework: https://osf.io/64jsb/

arXiv:1807.04098 [pdf, other]

doi 10.1007/978-3-030-10997-4_10

A Recurrent Neural Network Survival Model: Predicting Web User Return Time

Authors: Georg L. Grob, Ângelo Cardoso, C. H. Bryan Liu, Duncan A. Little, Benjamin Paul Chamberlain

Abstract: The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both… ▽ More The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: Accepted into ECML PKDD 2018; 8 figures and 1 table

Journal ref: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science, vol 11053. pp 152-168

arXiv:1806.02588 [pdf, other]

Designing Experiments to Measure Incrementality on Facebook

Authors: C. H. Bryan Liu, Elaine M. Bettaney, Benjamin Paul Chamberlain

Abstract: The importance of Facebook advertising has risen dramatically in recent years, with the platform accounting for almost 20% of the global online ad spend in 2017. An important consideration in advertising is incrementality: how much of the change in an experimental metric is an advertising campaign responsible for. To measure incrementality, Facebook provide lift studies. As Facebook lift studies d… ▽ More The importance of Facebook advertising has risen dramatically in recent years, with the platform accounting for almost 20% of the global online ad spend in 2017. An important consideration in advertising is incrementality: how much of the change in an experimental metric is an advertising campaign responsible for. To measure incrementality, Facebook provide lift studies. As Facebook lift studies differ from standard A/B tests, the online experimentation literature does not describe how to calculate parameters such as power and minimum sample size. Facebook also offer multi-cell lift tests, which can be used to compare campaigns that don't have statistically identical audiences. In this case, there is no literature describing how to measure the significance of the difference in incrementality between cells, or how to estimate the power or minimum sample size. We fill these gaps in the literature by providing the statistical power and required sample size calculation for Facebook lift studies. We then generalise the statistical significance, power, and required sample size calculation to multi-cell lift studies. We represent our results theoretically in terms of the distributions of test metrics and in practical terms relating to the metrics used by practitioners, making all of our code publicly available. △ Less

Submitted 11 July, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

Comments: Accepted into 2018 AdKDD & TargetAd Workshop in conjunction with KDD 2018; 6 pages, 4 figures, and 2 tables

arXiv:1803.06258 [pdf, other]

Online Controlled Experiments for Personalised e-Commerce Strategies: Design, Challenges, and Pitfalls

Authors: C. H. Bryan Liu, Benjamin Paul Chamberlain

Abstract: Online controlled experiments are the primary tool for measuring the causal impact of product changes in digital businesses. It is increasingly common for digital products and services to interact with customers in a personalised way. Using online controlled experiments to optimise personalised interaction strategies is challenging because the usual assumption of statistically equivalent user grou… ▽ More Online controlled experiments are the primary tool for measuring the causal impact of product changes in digital businesses. It is increasingly common for digital products and services to interact with customers in a personalised way. Using online controlled experiments to optimise personalised interaction strategies is challenging because the usual assumption of statistically equivalent user groups is violated. Additionally, challenges are introduced by users qualifying for strategies based on dynamic, stochastic attributes. Traditional A/B tests can salvage statistical equivalence by pre-allocating users to control and exposed groups, but this dilutes the experimental metrics and reduces the test power. We present a stacked incrementality test framework that addresses problems with running online experiments for personalised user strategies. We derive bounds that show that our framework is superior to the best simple A/B test given enough users and that this condition is easily met for large scale online experiments. In addition, we provide a test power calculator and describe a selection of pitfalls and lessons learnt from our experience using it. △ Less

Submitted 1 July, 2021; v1 submitted 16 March, 2018; originally announced March 2018.

Comments: Not peer-reviewed but retained for historic interest. Removed an erroneous statement on Welch's t-test assumptions in Section 3.2. 9 pages, 7 figures

arXiv:1712.01209 [pdf, other]

doi 10.4230/OASIcs.ICCSW.2018.1

Speeding Up BigClam Implementation on SNAP

Authors: C. H. Bryan Liu, Benjamin Paul Chamberlain

Abstract: We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlapping communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community -… ▽ More We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlapping communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community - dominates the runtime of the implementation, yet the computation is not parallelized. We show that by parallelizing computations across multiple threads using OpenMP we can speed up the algorithm by 5.3 times when solving large networks for communities, while preserving the integrity of the program and the result. △ Less

Submitted 4 September, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

Comments: To appear in 2018 Imperial College Computing Student Workshop (ICCSW'18); 12 pages, 4 figures, and 3 tables

Journal ref: 2018 Imperial College Computing Student Workshop (ICCSW 2018). OpenAccess Series in Informatics (OASIcs), vol. 66, pp. 1:1-1:13

arXiv:1706.09865 [pdf, other]

doi 10.1007/978-3-319-71273-4_9

Generalising Random Forest Parameter Optimisation to Include Stability and Cost

Authors: C. H. Bryan Liu, Benjamin Paul Chamberlain, Duncan A. Little, Angelo Cardoso

Abstract: Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest… ▽ More Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forests predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics. △ Less

Submitted 13 July, 2017; v1 submitted 29 June, 2017; originally announced June 2017.

Comments: To appear in ECML-PKDD 2017

Journal ref: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. LNCS vol 10536, pp. 102-113 (2017)

arXiv:1703.02596 [pdf, other]

doi 10.1145/3097983.3098123

Customer Lifetime Value Prediction Using Embeddings

Authors: Benjamin Paul Chamberlain, Angelo Cardoso, C. H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth

Abstract: We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of th… ▽ More We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shopping experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features. △ Less

Submitted 6 July, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

Comments: 10 pages, 11 figures

Journal ref: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pages 1753-1762, 2017

Showing 1–8 of 8 results for author: Liu, C H B