-
Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing
Authors:
Ravi Kumar,
Shahin Boluki,
Karl Isler,
Jonas Rauch,
Darius Walczak
Abstract:
We consider the problem of dynamic pricing of a product in the presence of feature-dependent price sensitivity. Developing practical algorithms that can estimate price elasticities robustly, especially when information about no purchases (losses) is not available, to drive such automated pricing systems is a challenge faced by many industries. Based on the Poisson semi-parametric approach, we cons…
▽ More
We consider the problem of dynamic pricing of a product in the presence of feature-dependent price sensitivity. Developing practical algorithms that can estimate price elasticities robustly, especially when information about no purchases (losses) is not available, to drive such automated pricing systems is a challenge faced by many industries. Based on the Poisson semi-parametric approach, we construct a flexible yet interpretable demand model where the price related part is parametric while the remaining (nuisance) part of the model is non-parametric and can be modeled via sophisticated machine learning (ML) techniques. The estimation of price-sensitivity parameters of this model via direct one-stage regression techniques may lead to biased estimates due to regularization. To address this concern, we propose a two-stage estimation methodology which makes the estimation of the price-sensitivity parameters robust to biases in the estimators of the nuisance parameters of the model. In the first-stage we construct estimators of observed purchases and prices given the feature vector using sophisticated ML estimators such as deep neural networks. Utilizing the estimators from the first-stage, in the second-stage we leverage a Bayesian dynamic generalized linear model to estimate the price-sensitivity parameters. We test the performance of the proposed estimation schemes on simulated and real sales transaction data from the Airline industry. Our numerical studies demonstrate that our proposed two-stage approach reduces the estimation error in price-sensitivity parameters from 25\% to 4\% in realistic simulation settings. The two-stage estimation techniques proposed in this work allows practitioners to leverage modern ML techniques to robustly estimate price-sensitivities while still maintaining interpretability and allowing ease of validation of its various constituent parts.
△ Less
Submitted 19 December, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
VFDS: Variational Foresight Dynamic Selection in Bayesian Neural Networks for Efficient Human Activity Recognition
Authors:
Randy Ardywibowo,
Shahin Boluki,
Zhangyang Wang,
Bobak Mortazavi,
Shuai Huang,
Xiaoning Qian
Abstract:
In many machine learning tasks, input features with varying degrees of predictive capability are acquired at varying costs. In order to optimize the performance-cost trade-off, one would select features to observe a priori. However, given the changing context with previous observations, the subset of predictive features to select may change dynamically. Therefore, we face the challenging new probl…
▽ More
In many machine learning tasks, input features with varying degrees of predictive capability are acquired at varying costs. In order to optimize the performance-cost trade-off, one would select features to observe a priori. However, given the changing context with previous observations, the subset of predictive features to select may change dynamically. Therefore, we face the challenging new problem of foresight dynamic selection (FDS): finding a dynamic and light-weight policy to decide which features to observe next, before actually observing them, for overall performance-cost trade-offs. To tackle FDS, this paper proposes a Bayesian learning framework of Variational Foresight Dynamic Selection (VFDS). VFDS learns a policy that selects the next feature subset to observe, by optimizing a variational Bayesian objective that characterizes the trade-off between model performance and feature cost. At its core is an implicit variational distribution on binary gates that are dependent on previous observations, which will select the next subset of features to observe. We apply VFDS on the Human Activity Recognition (HAR) task where the performance-cost trade-off is critical in its practice. Extensive results demonstrate that VFDS selects different features under changing contexts, notably saving sensory costs while maintaining or improving the HAR accuracy. Moreover, the features that VFDS dynamically select are shown to be interpretable and associated with the different activity types. We will release the code.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
SimCD: Simultaneous Clustering and Differential expression analysis for single-cell transcriptomic data
Authors:
Seyednami Niyakan,
Ehsan Hajiramezanali,
Shahin Boluki,
Siamak Zamani Dadaneh,
Xiaoning Qian
Abstract:
Single-Cell RNA sequencing (scRNA-seq) measurements have facilitated genome-scale transcriptomic profiling of individual cells, with the hope of deconvolving cellular dynamic changes in corresponding cell sub-populations to better understand molecular mechanisms of different development processes. Several scRNA-seq analysis methods have been proposed to first identify cell sub-populations by clust…
▽ More
Single-Cell RNA sequencing (scRNA-seq) measurements have facilitated genome-scale transcriptomic profiling of individual cells, with the hope of deconvolving cellular dynamic changes in corresponding cell sub-populations to better understand molecular mechanisms of different development processes. Several scRNA-seq analysis methods have been proposed to first identify cell sub-populations by clustering and then separately perform differential expression analysis to understand gene expression changes. Their corresponding statistical models and inference algorithms are often designed disjointly. We develop a new method -- SimCD -- that explicitly models cell heterogeneity and dynamic differential changes in one unified hierarchical gamma-negative binomial (hGNB) model, allowing simultaneous cell clustering and differential expression analysis for scRNA-seq data. Our method naturally defines cell heterogeneity by dynamic expression changes, which is expected to help achieve better performances on the two tasks compared to the existing methods that perform them separately. In addition, SimCD better models dropout (zero inflation) in scRNA-seq data by both cell- and gene-level factors and obviates the need for sophisticated pre-processing steps such as normalization, thanks to the direct modeling of scRNA-seq count data by the rigorous hGNB model with an efficient Gibbs sampling inference algorithm. Extensive comparisons with the state-of-the-art methods on both simulated and real-world scRNA-seq count data demonstrate the capability of SimCD to discover cell clusters and capture dynamic expression changes. Furthermore, SimCD helps identify several known genes affected by food deprivation in hypothalamic neuron cell subtypes as well as some new potential markers, suggesting the capability of SimCD for bio-marker discovery.
△ Less
Submitted 3 April, 2021;
originally announced April 2021.
-
NADS: Neural Architecture Distribution Search for Uncertainty Awareness
Authors:
Randy Ardywibowo,
Shahin Boluki,
Xinyu Gong,
Zhangyang Wang,
Xiaoning Qian
Abstract:
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. It becomes important for ML systems in critical applications to accurately quantify its predictive uncertainty and screen out these anomalous inputs. However, existing OoD detection approaches are prone to errors and even sometimes a…
▽ More
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. It becomes important for ML systems in critical applications to accurately quantify its predictive uncertainty and screen out these anomalous inputs. However, existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples. Unlike standard learning tasks, there is currently no well established guiding principle for designing OoD detection architectures that can accurately quantify uncertainty. To address these problems, we first seek to identify guiding principles for designing uncertainty-aware architectures, by proposing Neural Architecture Distribution Search (NADS). NADS searches for a distribution of architectures that perform well on a given task, allowing us to identify common building blocks among all uncertainty-aware architectures. With this formulation, we are able to optimize a stochastic OoD detection objective and construct an ensemble of models to perform OoD detection. We perform multiple OoD detection experiments and observe that our NADS performs favorably, with up to 57% improvement in accuracy compared to state-of-the-art methods among 15 different testing configurations.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
Bayesian Graph Neural Networks with Adaptive Connection Sampling
Authors:
Arman Hasanzadeh,
Ehsan Hajiramezanali,
Shahin Boluki,
Mingyuan Zhou,
Nick Duffield,
Krishna Narayanan,
Xiaoning Qian
Abstract:
We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs) that generalizes existing stochastic regularization methods for training GNNs. The proposed framework not only alleviates over-smoothing and over-fitting tendencies of deep GNNs, but also enables learning with uncertainty in graph analytic tasks with GNNs. Instead of using fixed sampling rates or hand-t…
▽ More
We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs) that generalizes existing stochastic regularization methods for training GNNs. The proposed framework not only alleviates over-smoothing and over-fitting tendencies of deep GNNs, but also enables learning with uncertainty in graph analytic tasks with GNNs. Instead of using fixed sampling rates or hand-tuning them as model hyperparameters in existing stochastic regularization methods, our adaptive connection sampling can be trained jointly with GNN model parameters in both global and local fashions. GNN training with adaptive connection sampling is shown to be mathematically equivalent to an efficient approximation of training Bayesian GNNs. Experimental results with ablation studies on benchmark datasets validate that adaptively learning the sampling rate given graph training data is the key to boost the performance of GNNs in semi-supervised node classification, less prone to over-smoothing and over-fitting with more robust prediction.
△ Less
Submitted 30 June, 2020; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator
Authors:
Siamak Zamani Dadaneh,
Shahin Boluki,
Mingzhang Yin,
Mingyuan Zhou,
Xiaoning Qian
Abstract:
Semantic hashing has become a crucial component of fast similarity search in many large-scale information retrieval systems, in particular, for text data. Variational auto-encoders (VAEs) with binary latent variables as hashing codes provide state-of-the-art performance in terms of precision for document retrieval. We propose a pairwise loss function with discrete latent VAE to reward within-class…
▽ More
Semantic hashing has become a crucial component of fast similarity search in many large-scale information retrieval systems, in particular, for text data. Variational auto-encoders (VAEs) with binary latent variables as hashing codes provide state-of-the-art performance in terms of precision for document retrieval. We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing. Instead of solving the optimization relying on existing biased gradient estimators, an unbiased low-variance gradient estimator is adopted to optimize the hashing function by evaluating the non-differentiable loss function over two correlated sets of binary hashing codes to control the variance of gradient estimates. This new semantic hashing framework achieves superior performance compared to the state-of-the-arts, as demonstrated by our comprehensive experiments.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
Learnable Bernoulli Dropout for Bayesian Deep Learning
Authors:
Shahin Boluki,
Randy Ardywibowo,
Siamak Zamani Dadaneh,
Mingyuan Zhou,
Xiaoning Qian
Abstract:
In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. By probabilistic modeling of Bernoulli dropout, our method enables more robust prediction and uncertainty quantification in deep models. Especially, when combined with variational auto-encoders (VAEs), LBD enab…
▽ More
In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. By probabilistic modeling of Bernoulli dropout, our method enables more robust prediction and uncertainty quantification in deep models. Especially, when combined with variational auto-encoders (VAEs), LBD enables flexible semi-implicit posterior representations, leading to new semi-implicit VAE~(SIVAE) models. We solve the optimization for training with respect to the dropout parameters using Augment-REINFORCE-Merge (ARM), an unbiased and low-variance gradient estimator. Our experiments on a range of tasks show the superior performance of our approach compared with other commonly used dropout schemes. Overall, LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation. Moreover, using SIVAE, we can achieve state-of-the-art performance on collaborative filtering for implicit feedback on several public datasets.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
ARSM Gradient Estimator for Supervised Learning to Rank
Authors:
Siamak Zamani Dadaneh,
Shahin Boluki,
Mingyuan Zhou,
Xiaoning Qian
Abstract:
We propose a new model for supervised learning to rank. In our model, the relevance labels are assumed to follow a categorical distribution whose probabilities are constructed based on a scoring function. We optimize the training objective with respect to the multivariate categorical variables with an unbiased and low-variance gradient estimator. Learning-to-rank methods can generally be categoriz…
▽ More
We propose a new model for supervised learning to rank. In our model, the relevance labels are assumed to follow a categorical distribution whose probabilities are constructed based on a scoring function. We optimize the training objective with respect to the multivariate categorical variables with an unbiased and low-variance gradient estimator. Learning-to-rank methods can generally be categorized into pointwise, pairwise, and listwise approaches. Although our scoring function is pointwise, the proposed framework permits flexibility over the choice of the loss function. In our new model, the loss function need not be differentiable and can either be pointwise or listwise. Our proposed method achieves better or comparable results on two datasets compared with existing pairwise and listwise methods.
△ Less
Submitted 18 February, 2020; v1 submitted 1 November, 2019;
originally announced November 2019.
-
Optimal Clustering with Missing Values
Authors:
Shahin Boluki,
Siamak Zamani Dadaneh,
Xiaoning Qian,
Edward R. Dougherty
Abstract:
Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is t…
▽ More
Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data.
We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Experimental Design via Generalized Mean Objective Cost of Uncertainty
Authors:
Shahin Boluki,
Xiaoning Qian,
Edward R. Dougherty
Abstract:
The mean objective cost of uncertainty (MOCU) quantifies the performance cost of using an operator that is optimal across an uncertainty class of systems as opposed to using an operator that is optimal for a particular system. MOCU-based experimental design selects an experiment to maximally reduce MOCU, thereby gaining the greatest reduction of uncertainty impacting the operational objective. The…
▽ More
The mean objective cost of uncertainty (MOCU) quantifies the performance cost of using an operator that is optimal across an uncertainty class of systems as opposed to using an operator that is optimal for a particular system. MOCU-based experimental design selects an experiment to maximally reduce MOCU, thereby gaining the greatest reduction of uncertainty impacting the operational objective. The original formulation applied to finding optimal system operators, where optimality is with respect to a cost function, such as mean-square error; and the prior distribution governing the uncertainty class relates directly to the underlying physical system. Here we provide a generalized MOCU and the corresponding experimental design. We then demonstrate how this new formulation includes as special cases MOCU-based experimental design methods developed for materials science and genomic networks when there is experimental error. Most importantly, we show that the classical Knowledge Gradient and Efficient Global Optimization experimental design procedures are actually implementations of MOCU-based experimental design under their modeling assumptions.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.