-
Adaptive Sampling for Estimating Distributions: A Bayesian Upper Confidence Bound Approach
Authors:
Dhruva Kartik,
Neeraj Sood,
Urbashi Mitra,
Tara Javidi
Abstract:
The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existin…
▽ More
The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existing approaches. The posterior distribution on the pmfs in the Bayesian setting allows for a tighter computation of upper confidence bounds which leads to significant performance gains in practice. Using this approach, adaptive sampling protocols are proposed for estimating SARS-CoV-2 seroprevalence in various groups such as location and ethnicity. The effectiveness of this strategy is discussed using data obtained from a seroprevalence survey in Los Angeles county.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
SmartTriage: A system for personalized patient data capture, documentation generation, and decision support
Authors:
Ilya Valmianski,
Nave Frost,
Navdeep Sood,
Yang Wang,
Baodong Liu,
James J. Zhu,
Sunil Karumuri,
Ian M. Finn,
Daniel S. Zisook
Abstract:
Symptom checkers have emerged as an important tool for collecting symptoms and diagnosing patients, minimizing the involvement of clinical personnel. We developed a machine-learning-backed system, SmartTriage, which goes beyond conventional symptom checking through a tight bi-directional integration with the electronic medical record (EMR). Conditioned on EMR-derived patient history, our system id…
▽ More
Symptom checkers have emerged as an important tool for collecting symptoms and diagnosing patients, minimizing the involvement of clinical personnel. We developed a machine-learning-backed system, SmartTriage, which goes beyond conventional symptom checking through a tight bi-directional integration with the electronic medical record (EMR). Conditioned on EMR-derived patient history, our system identifies the patient's chief complaint from a free-text entry and then asks a series of discrete questions to obtain relevant symptomatology. The patient-specific data are used to predict detailed ICD-10-CM codes as well as medication, laboratory, and imaging orders. Patient responses and clinical decision support (CDS) predictions are then inserted back into the EMR. To train the machine learning components of SmartTriage, we employed novel data sets of over 25 million primary care encounters and 1 million patient free-text reason-for-visit entries. These data sets were used to construct: (1) a long short-term memory (LSTM) based patient history representation, (2) a fine-tuned transformer model for chief complaint extraction, (3) a random forest model for question sequencing, and (4) a feed-forward network for CDS predictions. In total, our system supports 337 patient chief complaints, which together make up $>90\%$ of all primary care encounters at Kaiser Permanente.
△ Less
Submitted 11 November, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Building a Competitive Associative Classifier
Authors:
Nitakshi Sood,
Osmar Zaiane
Abstract:
With the huge success of deep learning, other machine learning paradigms have had to take back seat. Yet other models, particularly rule-based, are more readable and explainable and can even be competitive when labelled data is not abundant. However, most of the existing rule-based classifiers suffer from the production of a large number of classification rules, affecting the model readability. Th…
▽ More
With the huge success of deep learning, other machine learning paradigms have had to take back seat. Yet other models, particularly rule-based, are more readable and explainable and can even be competitive when labelled data is not abundant. However, most of the existing rule-based classifiers suffer from the production of a large number of classification rules, affecting the model readability. This hampers the classification accuracy as noisy rules might not add any useful informationfor classification and also lead to longer classification time. In this study, we propose SigD2 which uses a novel, two-stage pruning strategy which prunes most of the noisy, redundant and uninteresting rules and makes the classification model more accurate and readable. To make SigDirect more competitive with the most prevalent but uninterpretable machine learning-based classifiers like neural networks and support vector machines, we propose bagging and boosting on the ensemble of the SigDirect classifier. The results of the proposed algorithms are quite promising and we are able to obtain a minimal set of statistically significant rules for classification without jeopardizing the classification accuracy. We use 15 UCI datasets and compare our approach with eight existing systems.The SigD2 and boosted SigDirect (ACboost) ensemble model outperform various state-of-the-art classifiers not only in terms of classification accuracy but also in terms of the number of rules.
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
Learning and Testing Sub-groups with Heterogeneous Treatment Effects:A Sequence of Two Studies
Authors:
Rahul Ladhania,
Amelia Haviland,
Neeraj Sood,
Edward Kennedy,
Ateev Mehrotra
Abstract:
There is strong interest in estimating how the magnitude of treatment effects of an intervention vary across sub-groups of the population of interest. In our paper, we propose a two-study approach to first propose and then test heterogeneous treatment effects. In Study 1, we use a large observational dataset to learn sub-groups with the most distinctive treatment-outcome relationships ('high/low-i…
▽ More
There is strong interest in estimating how the magnitude of treatment effects of an intervention vary across sub-groups of the population of interest. In our paper, we propose a two-study approach to first propose and then test heterogeneous treatment effects. In Study 1, we use a large observational dataset to learn sub-groups with the most distinctive treatment-outcome relationships ('high/low-impact sub-groups'). We adopt a model-based recursive partitioning approach to propose the high/low impact sub-groups, and validate them by using sample-splitting. While the first study rules out noise, there is potential bias in our estimated heterogeneous treatment effects. Study 2 uses an experimental design, and here we classify our sample units based on sub-groups learned in Study 1. We then estimate treatment effects within each of the groups, thereby testing the causal hypotheses proposed in Study 1. Using patient claims data from the NBER MarketScan database, we apply our approach to estimate heterogeneous effects of a switch to a high-deductible health insurance plan on use of outpatient care by patients with a common chronic condition. We extend the method to non-parametrically learn the sub-groups in Study 1. We also compare the methods' performance to other state-of-the-art methods in the literature that make use only of the Study 2 data.
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
No-Regret Replanning under Uncertainty
Authors:
Wen Sun,
Niteesh Sood,
Debadeepta Dey,
Gireeja Ranade,
Siddharth Prakash,
Ashish Kapoor
Abstract:
This paper explores the problem of path planning under uncertainty. Specifically, we consider online receding horizon based planners that need to operate in a latent environment where the latent information can be modeled via Gaussian Processes. Online path planning in latent environments is challenging since the robot needs to explore the environment to get a more accurate model of latent informa…
▽ More
This paper explores the problem of path planning under uncertainty. Specifically, we consider online receding horizon based planners that need to operate in a latent environment where the latent information can be modeled via Gaussian Processes. Online path planning in latent environments is challenging since the robot needs to explore the environment to get a more accurate model of latent information for better planning later and also achieves the task as quick as possible. We propose UCB style algorithms that are popular in the bandit settings and show how those analyses can be adapted to the online robotic path planning problems. The proposed algorithm trades-off exploration and exploitation in near-optimal manner and has appealing no-regret properties. We demonstrate the efficacy of the framework on the application of aircraft flight path planning when the winds are partially observed.
△ Less
Submitted 16 September, 2016;
originally announced September 2016.