Search | arXiv e-print repository

Improving Coverage in Combined Prediction Sets with Weighted p-values

Authors: Gina Wong, Drew Prinster, Suchi Saria, Rama Chellappa, Anqi Liu

Abstract: Conformal prediction quantifies the uncertainty of machine learning models by augmenting point predictions with valid prediction sets, assuming exchangeability. For complex scenarios involving multiple trials, models, or data sources, conformal prediction sets can be aggregated to create a prediction set that captures the overall uncertainty, often improving precision. However, aggregating multipl… ▽ More Conformal prediction quantifies the uncertainty of machine learning models by augmenting point predictions with valid prediction sets, assuming exchangeability. For complex scenarios involving multiple trials, models, or data sources, conformal prediction sets can be aggregated to create a prediction set that captures the overall uncertainty, often improving precision. However, aggregating multiple prediction sets with individual $1-α$ coverage inevitably weakens the overall guarantee, typically resulting in $1-2α$ worst-case coverage. In this work, we propose a framework for the weighted aggregation of prediction sets, where weights are assigned to each prediction set based on their contribution. Our framework offers flexible control over how the sets are aggregated, achieving tighter coverage bounds that interpolate between the $1-2α$ guarantee of the combined models and the $1-α$ guarantee of an individual model depending on the distribution of weights. We extend our framework to data-dependent weights, and we derive a general procedure for data-dependent weight aggregation that maintains finite-sample validity. We demonstrate the effectiveness of our methods through experiments on synthetic and real data in the mixture-of-experts setting, and we show that aggregation with data-dependent weights provides a form of adaptive coverage. △ Less

Submitted 16 May, 2025; originally announced May 2025.

arXiv:2505.04608 [pdf, ps, other]

WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

Authors: Drew Prinster, Xing Han, Anqi Liu, Suchi Saria

Abstract: Responsibly deploying artificial intelligence (AI) / machine learning (ML) systems in high-stakes settings arguably requires not only proof of system reliability, but also continual, post-deployment monitoring to quickly detect and address any unsafe behavior. Methods for nonparametric sequential testing -- especially conformal test martingales (CTMs) and anytime-valid inference -- offer promising… ▽ More Responsibly deploying artificial intelligence (AI) / machine learning (ML) systems in high-stakes settings arguably requires not only proof of system reliability, but also continual, post-deployment monitoring to quickly detect and address any unsafe behavior. Methods for nonparametric sequential testing -- especially conformal test martingales (CTMs) and anytime-valid inference -- offer promising tools for this monitoring task. However, existing approaches are restricted to monitoring limited hypothesis classes or ``alarm criteria'' (e.g., detecting data shifts that violate certain exchangeability or IID assumptions), do not allow for online adaptation in response to shifts, and/or cannot diagnose the cause of degradation or alarm. In this paper, we address these limitations by proposing a weighted generalization of conformal test martingales (WCTMs), which lay a theoretical foundation for online monitoring for any unexpected changepoints in the data distribution while controlling false-alarms. For practical applications, we propose specific WCTM algorithms that adapt online to mild covariate shifts (in the marginal input distribution), quickly detect harmful shifts, and diagnose those harmful shifts as concept shifts (in the conditional label distribution) or extreme (out-of-support) covariate shifts that cannot be easily adapted to. On real-world datasets, we demonstrate improved performance relative to state-of-the-art baselines. △ Less

Submitted 1 June, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

Comments: To be published in The International Conference on Machine Learning (ICML), 2025

arXiv:2410.02935 [pdf, other]

On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions

Authors: Huy Nguyen, Xing Han, Carl Harris, Suchi Saria, Nhat Ho

Abstract: With the growing prominence of the Mixture of Experts (MoE) architecture in developing large-scale foundation models, we investigate the Hierarchical Mixture of Experts (HMoE), a specialized variant of MoE that excels in handling complex inputs and improving performance on targeted tasks. Our analysis highlights the advantages of using the Laplace gating function over the traditional Softmax gatin… ▽ More With the growing prominence of the Mixture of Experts (MoE) architecture in developing large-scale foundation models, we investigate the Hierarchical Mixture of Experts (HMoE), a specialized variant of MoE that excels in handling complex inputs and improving performance on targeted tasks. Our analysis highlights the advantages of using the Laplace gating function over the traditional Softmax gating within the HMoE frameworks. We theoretically demonstrate that applying the Laplace gating function at both levels of the HMoE model helps eliminate undesirable parameter interactions caused by the Softmax gating and, therefore, accelerates the expert convergence as well as enhances the expert specialization. Empirical validation across diverse scenarios supports these theoretical claims. This includes large-scale multimodal tasks, image classification, and latent domain discovery and prediction tasks, where our modified HMoE models show great performance improvements compared to the conventional HMoE models. △ Less

Submitted 6 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: Huy Nguyen and Xing Han contributed equally to this work

arXiv:2405.06627 [pdf, other]

Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)

Authors: Drew Prinster, Samuel Stanton, Anqi Liu, Suchi Saria

Abstract: As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the da… ▽ More As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants' validity guarantees have assumed some form of ``quasi-exchangeability'' on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks. △ Less

Submitted 5 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: ICML 2024. Code available at https://github.com/drewprinster/conformal-mfcs

arXiv:2207.10716 [pdf, other]

JAWS: Auditing Predictive Uncertainty Under Covariate Shift

Authors: Drew Prinster, Anqi Liu, Suchi Saria

Abstract: We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on the core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with data-dependent likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoret… ▽ More We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on the core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with data-dependent likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoretically, we show that JAW relaxes the jackknife+'s assumption of data exchangeability to achieve the same finite-sample coverage guarantee even under covariate shift. JAWA further approaches the JAW guarantee in the limit of the sample size or the influence function order under common regularity assumptions. Moreover, we propose a general approach to repurposing predictive interval-generating methods and their guarantees to the reverse task: estimating the probability that a prediction is erroneous, based on user-specified error criteria such as a safe or acceptable tolerance threshold around the true label. We then propose \textbf{JAW-E} and \textbf{JAWA-E} as the repurposed proposed methods for this \textbf{E}rror assessment task. Practically, JAWS outperform state-of-the-art predictive inference baselines in a variety of biased real world data sets for interval-generation and error-assessment predictive uncertainty auditing tasks. △ Less

Submitted 23 November, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: Thirty-sixth Conference on Neural Information Processing Systems

arXiv:2012.12449 [pdf, other]

Partial Identifiability in Discrete Data With Measurement Error

Authors: Noam Finkelstein, Roy Adams, Suchi Saria, Ilya Shpitser

Abstract: When data contains measurement errors, it is necessary to make assumptions relating the observed, erroneous data to the unobserved true phenomena of interest. These assumptions should be justifiable on substantive grounds, but are often motivated by mathematical convenience, for the sake of exactly identifying the target of inference. We adopt the view that it is preferable to present bounds under… ▽ More When data contains measurement errors, it is necessary to make assumptions relating the observed, erroneous data to the unobserved true phenomena of interest. These assumptions should be justifiable on substantive grounds, but are often motivated by mathematical convenience, for the sake of exactly identifying the target of inference. We adopt the view that it is preferable to present bounds under justifiable assumptions than to pursue exact identification under dubious ones. To that end, we demonstrate how a broad class of modeling assumptions involving discrete variables, including common measurement error and conditional independence assumptions, can be expressed as linear constraints on the parameters of the model. We then use linear programming techniques to produce sharp bounds for factual and counterfactual distributions under measurement error in such models. We additionally propose a procedure for obtaining outer bounds on non-linear models. Our method yields sharp bounds in a number of important settings -- such as the instrumental variable scenario with measurement error -- for which no bounds were previously known. △ Less

Submitted 22 December, 2020; originally announced December 2020.

arXiv:2011.15099 [pdf, other]

The Impact of Time Series Length and Discretization on Longitudinal Causal Estimation Methods

Authors: Roy Adams, Suchi Saria, Michael Rosenblum

Abstract: The use of observational time series data to assess the impact of multi-time point interventions is becoming increasingly common as more health and activity data are collected and digitized via wearables, social media, and electronic health records. Such time series may involve hundreds or thousands of irregularly sampled observations. One common analysis approach is to simplify such time series b… ▽ More The use of observational time series data to assess the impact of multi-time point interventions is becoming increasingly common as more health and activity data are collected and digitized via wearables, social media, and electronic health records. Such time series may involve hundreds or thousands of irregularly sampled observations. One common analysis approach is to simplify such time series by first discretizing them into sequences before applying a discrete-time estimation method that adjusts for time-dependent confounding. In certain settings, this discretization results in sequences with many time points; however, the empirical properties of longitudinal causal estimators have not been systematically compared on long sequences. We compare three representative longitudinal causal estimation methods on simulated and real clinical data. Our simulations and analyses assume a Markov structure and that longitudinal treatments/exposures are binary-valued and have at most a single jump point. We identify sources of bias that arise from temporally discretizing the data and provide practical guidance for discretizing data and choosing between methods when working with long sequences. Additionally, we compare these estimators on real electronic health record data, evaluating the impact of early treatment for patients with a life-threatening complication of infection called sepsis. △ Less

Submitted 30 November, 2020; originally announced November 2020.

arXiv:2010.15100 [pdf, other]

Evaluating Model Robustness and Stability to Dataset Shift

Authors: Adarsh Subbaswamy, Roy Adams, Suchi Saria

Abstract: As the use of machine learning in high impact domains becomes widespread, the importance of evaluating safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which typically requires applying the model to multiple, independent datasets. Since the cost of collecting such datasets is often prohibitive, in this paper, we propose a fr… ▽ More As the use of machine learning in high impact domains becomes widespread, the importance of evaluating safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which typically requires applying the model to multiple, independent datasets. Since the cost of collecting such datasets is often prohibitive, in this paper, we propose a framework for analyzing this type of stability using the available data. We use the original evaluation data to determine distributions under which the algorithm performs poorly, and estimate the algorithm's performance on the "worst-case" distribution. We consider shifts in user defined conditional distributions, allowing some distributions to shift while keeping other portions of the data distribution fixed. For example, in a healthcare context, this allows us to consider shifts in clinical practice while keeping the patient population fixed. To address the challenges associated with estimation in complex, high-dimensional distributions, we derive a "debiased" estimator which maintains $\sqrt{N}$-consistency even when machine learning methods with slower convergence rates are used to estimate the nuisance parameters. In experiments on a real medical risk prediction task, we show this estimator can be used to analyze stability and accounts for realistic shifts that could not previously be expressed. The proposed framework allows practitioners to proactively evaluate the safety of their models without requiring additional data collection. △ Less

Submitted 15 March, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

Comments: In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

arXiv:2002.08948 [pdf, other]

I-SPEC: An End-to-End Framework for Learning Transportable, Shift-Stable Models

Authors: Adarsh Subbaswamy, Suchi Saria

Abstract: Shifts in environment between development and deployment cause classical supervised learning to produce models that fail to generalize well to new target distributions. Recently, many solutions which find invariant predictive distributions have been developed. Among these, graph-based approaches do not require data from the target environment and can capture more stable information than alternativ… ▽ More Shifts in environment between development and deployment cause classical supervised learning to produce models that fail to generalize well to new target distributions. Recently, many solutions which find invariant predictive distributions have been developed. Among these, graph-based approaches do not require data from the target environment and can capture more stable information than alternative methods which find stable feature sets. However, these approaches assume that the data generating process is known in the form of a full causal graph, which is generally not the case. In this paper, we propose I-SPEC, an end-to-end framework that addresses this shortcoming by using data to learn a partial ancestral graph (PAG). Using the PAG we develop an algorithm that determines an interventional distribution that is stable to the declared shifts; this subsumes existing approaches which find stable feature sets that are less accurate. We apply I-SPEC to a mortality prediction problem to show it can learn a model that is robust to shifts without needing upfront knowledge of the full causal DAG. △ Less

Submitted 20 February, 2020; originally announced February 2020.

arXiv:1905.11374 [pdf, other]

doi 10.1515/jci-2021-0042

A Unifying Causal Framework for Analyzing Dataset Shift-stable Learning Algorithms

Authors: Adarsh Subbaswamy, Bryant Chen, Suchi Saria

Abstract: Recent interest in the external validity of prediction models (i.e., the problem of different train and test distributions, known as dataset shift) has produced many methods for finding predictive distributions that are invariant to dataset shifts and can be used for prediction in new, unseen environments. However, these methods consider different types of shifts and have been developed under disp… ▽ More Recent interest in the external validity of prediction models (i.e., the problem of different train and test distributions, known as dataset shift) has produced many methods for finding predictive distributions that are invariant to dataset shifts and can be used for prediction in new, unseen environments. However, these methods consider different types of shifts and have been developed under disparate frameworks, making it difficult to theoretically analyze how solutions differ with respect to stability and accuracy. Taking a causal graphical view, we use a flexible graphical representation to express various types of dataset shifts. Given a known graph of the data generating process, we show that all invariant distributions correspond to a causal hierarchy of graphical operators which disable the edges in the graph that are responsible for the shifts. The hierarchy provides a common theoretical underpinning for understanding when and how stability to shifts can be achieved, and in what ways stable distributions can differ. We use it to establish conditions for minimax optimal performance across environments, and derive new algorithms that find optimal stable distributions. Using this new perspective, we empirically demonstrate that that there is a tradeoff between minimax and average performance. △ Less

Submitted 18 July, 2022; v1 submitted 27 May, 2019; originally announced May 2019.

Comments: Published in the Journal of Causal Inference

Journal ref: Journal of Causal Inference, 10(1), 64-89

arXiv:1904.05268 [pdf, other]

Active Learning for Decision-Making from Imbalanced Observational Data

Authors: Iiris Sundin, Peter Schulam, Eero Siivola, Aki Vehtari, Suchi Saria, Samuel Kaski

Abstract: Machine learning can help personalized decision support by learning models to predict individual treatment effects (ITE). This work studies the reliability of prediction-based decision-making in a task of deciding which action $a$ to take for a target unit after observing its covariates $\tilde{x}$ and predicted outcomes $\hat{p}(\tilde{y} \mid \tilde{x}, a)$. An example case is personalized medic… ▽ More Machine learning can help personalized decision support by learning models to predict individual treatment effects (ITE). This work studies the reliability of prediction-based decision-making in a task of deciding which action $a$ to take for a target unit after observing its covariates $\tilde{x}$ and predicted outcomes $\hat{p}(\tilde{y} \mid \tilde{x}, a)$. An example case is personalized medicine and the decision of which treatment to give to a patient. A common problem when learning these models from observational data is imbalance, that is, difference in treated/control covariate distributions, which is known to increase the upper bound of the expected ITE estimation error. We propose to assess the decision-making reliability by estimating the ITE model's Type S error rate, which is the probability of the model inferring the sign of the treatment effect wrong. Furthermore, we use the estimated reliability as a criterion for active learning, in order to collect new (possibly expensive) observations, instead of making a forced choice based on unreliable predictions. We demonstrate the effectiveness of this decision-making aware active learning in two decision-making tasks: in simulated data with binary outcomes and in a medical dataset with synthetic and continuous treatment outcomes. △ Less

Submitted 6 June, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: Published in Proceedings of the 36th International Conference on Machine Learning (ICML) 2019. 15 pages (10 paper + 5 supplementary), 7 figures

arXiv:1901.09060 [pdf, other]

Learning Models from Data with Measurement Error: Tackling Underreporting

Authors: Roy Adams, Yuelong Ji, Xiaobin Wang, Suchi Saria

Abstract: Measurement error in observational datasets can lead to systematic bias in inferences based on these datasets. As studies based on observational data are increasingly used to inform decisions with real-world impact, it is critical that we develop a robust set of techniques for analyzing and adjusting for these biases. In this paper we present a method for estimating the distribution of an outcome… ▽ More Measurement error in observational datasets can lead to systematic bias in inferences based on these datasets. As studies based on observational data are increasingly used to inform decisions with real-world impact, it is critical that we develop a robust set of techniques for analyzing and adjusting for these biases. In this paper we present a method for estimating the distribution of an outcome given a binary exposure that is subject to underreporting. Our method is based on a missing data view of the measurement error problem, where the true exposure is treated as a latent variable that is marginalized out of a joint model. We prove three different conditions under which the outcome distribution can still be identified from data containing only error-prone observations of the exposure. We demonstrate this method on synthetic data and analyze its sensitivity to near violations of the identifiability conditions. Finally, we use this method to estimate the effects of maternal smoking and opioid use during pregnancy on childhood obesity, two import problems from public health. Using the proposed method, we estimate these effects using only subject-reported drug use data and substantially refine the range of estimates generated by a sensitivity analysis-based approach. Further, the estimates produced by our method are consistent with existing literature on both the effects of maternal smoking and the rate at which subjects underreport smoking. △ Less

Submitted 25 January, 2019; originally announced January 2019.

arXiv:1901.00403 [pdf, other]

Can You Trust This Prediction? Auditing Pointwise Reliability After Learning

Authors: Peter Schulam, Suchi Saria

Abstract: To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncert… ▽ More To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model's loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do. △ Less

Submitted 28 February, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

Comments: To appear in the proceedings of Artificial Intelligence and Statistics (AISTATS) 2019

arXiv:1812.04597 [pdf, other]

Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport

Authors: Adarsh Subbaswamy, Peter Schulam, Suchi Saria

Abstract: Classical supervised learning produces unreliable models when training and target distributions differ, with most existing solutions requiring samples from the target domain. We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to diff… ▽ More Classical supervised learning produces unreliable models when training and target distributions differ, with most existing solutions requiring samples from the target domain. We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to differ as expressed in a causal selection diagram. Specifically, we remove variables generated by unstable mechanisms from the joint factorization to yield the Surgery Estimator---an interventional distribution that is invariant to the differences across environments. We prove that the surgery estimator finds stable relationships in strictly more scenarios than previous approaches which only consider conditional relationships, and demonstrate this in simulated experiments. We also evaluate on real world data for which the true causal diagram is unknown, performing competitively against entirely data-driven approaches. △ Less

Submitted 28 February, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

Comments: In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. Previously presented at the NeurIPS 2018 Causal Learning Workshop

arXiv:1810.03025 [pdf, other]

Discretizing Logged Interaction Data Biases Learning for Decision-Making

Authors: Peter Schulam, Suchi Saria

Abstract: Time series data that are not measured at regular intervals are commonly discretized as a preprocessing step. For example, data about customer arrival times might be simplified by summing the number of arrivals within hourly intervals, which produces a discrete-time time series that is easier to model. In this abstract, we show that discretization introduces a bias that affects models trained for… ▽ More Time series data that are not measured at regular intervals are commonly discretized as a preprocessing step. For example, data about customer arrival times might be simplified by summing the number of arrivals within hourly intervals, which produces a discrete-time time series that is easier to model. In this abstract, we show that discretization introduces a bias that affects models trained for decision-making. We refer to this phenomenon as discretization bias, and show that we can avoid it by using continuous-time models instead. △ Less

Submitted 6 October, 2018; originally announced October 2018.

Comments: This is a standalone short paper describing a new type of bias that can arise when learning from time series data for sequential decision-making problems

arXiv:1808.03253 [pdf, other]

Counterfactual Normalization: Proactively Addressing Dataset Shift and Improving Reliability Using Causal Mechanisms

Authors: Adarsh Subbaswamy, Suchi Saria

Abstract: Predictive models can fail to generalize from training to deployment environments because of dataset shift, posing a threat to model reliability and the safety of downstream decisions made in practice. Instead of using samples from the target distribution to reactively correct dataset shift, we use graphical knowledge of the causal mechanisms relating variables in a prediction problem to proactive… ▽ More Predictive models can fail to generalize from training to deployment environments because of dataset shift, posing a threat to model reliability and the safety of downstream decisions made in practice. Instead of using samples from the target distribution to reactively correct dataset shift, we use graphical knowledge of the causal mechanisms relating variables in a prediction problem to proactively remove relationships that do not generalize across environments, even when these relationships may depend on unobserved variables (violations of the "no unobserved confounders" assumption). To accomplish this, we identify variables with unstable paths of statistical influence and remove them from the model. We also augment the causal graph with latent counterfactual variables that isolate unstable paths of statistical influence, allowing us to retain stable paths that would otherwise be removed. Our experiments demonstrate that models that remove vulnerable variables and use estimates of the latent variables transfer better, often outperforming in the target domain despite some accuracy loss in the training domain. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI), 2018. Revised from print version

arXiv:1708.04757 [pdf, other]

Scalable Joint Models for Reliable Uncertainty-Aware Event Prediction

Authors: Hossein Soleimani, James Hensman, Suchi Saria

Abstract: Missing data and noisy observations pose significant challenges for reliably predicting events from irregularly sampled multivariate time series (longitudinal) data. Imputation methods, which are typically used for completing the data prior to event prediction, lack a principled mechanism to account for the uncertainty due to missingness. Alternatively, state-of-the-art joint modeling techniques c… ▽ More Missing data and noisy observations pose significant challenges for reliably predicting events from irregularly sampled multivariate time series (longitudinal) data. Imputation methods, which are typically used for completing the data prior to event prediction, lack a principled mechanism to account for the uncertainty due to missingness. Alternatively, state-of-the-art joint modeling techniques can be used for jointly modeling the longitudinal and event data and compute event probabilities conditioned on the longitudinal observations. These approaches, however, make strong parametric assumptions and do not easily scale to multivariate signals with many observations. Our proposed approach consists of several key innovations. First, we develop a flexible and scalable joint model based upon sparse multiple-output Gaussian processes. Unlike state-of-the-art joint models, the proposed model can explain highly challenging structure including non-Gaussian noise while scaling to large data. Second, we derive an optimal policy for predicting events using the distribution of the event occurrence estimated by the joint model. The derived policy trades-off the cost of a delayed detection versus incorrect assessments and abstains from making decisions when the estimated event probability does not satisfy the derived confidence criteria. Experiments on a large dataset show that the proposed framework significantly outperforms state-of-the-art techniques in event prediction. △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: To appear in IEEE Transaction on Pattern Analysis and Machine Intelligence

arXiv:1704.02038 [pdf, other]

Treatment-Response Models for Counterfactual Reasoning with Continuous-time, Continuous-valued Interventions

Authors: Hossein Soleimani, Adarsh Subbaswamy, Suchi Saria

Abstract: Treatment effects can be estimated from observational data as the difference in potential outcomes. In this paper, we address the challenge of estimating the potential outcome when treatment-dose levels can vary continuously over time. Further, the outcome variable may not be measured at a regular frequency. Our proposed solution represents the treatment response curves using linear time-invariant… ▽ More Treatment effects can be estimated from observational data as the difference in potential outcomes. In this paper, we address the challenge of estimating the potential outcome when treatment-dose levels can vary continuously over time. Further, the outcome variable may not be measured at a regular frequency. Our proposed solution represents the treatment response curves using linear time-invariant dynamical systems---this provides a flexible means for modeling response over time to highly variable dose curves. Moreover, for multivariate data, the proposed method: uncovers shared structure in treatment response and the baseline across multiple markers; and, flexibly models challenging correlation structure both across and within signals over time. For this, we build upon the framework of multiple-output Gaussian Processes. On simulated and a challenging clinical dataset, we show significant gains in accuracy over state-of-the-art models. △ Less

Submitted 4 November, 2017; v1 submitted 6 April, 2017; originally announced April 2017.

Comments: In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence (UAI-2017), Sydney, Australia, August 2017. The first two authors contributed equally to this work

arXiv:1703.10651 [pdf, other]

Reliable Decision Support using Counterfactual Models

Authors: Peter Schulam, Suchi Saria

Abstract: Decision-makers are faced with the challenge of estimating what is likely to happen when they take an action. For instance, if I choose not to treat this patient, are they likely to die? Practitioners commonly use supervised learning algorithms to fit predictive models that help decision-makers reason about likely future outcomes, but we show that this approach is unreliable, and sometimes even da… ▽ More Decision-makers are faced with the challenge of estimating what is likely to happen when they take an action. For instance, if I choose not to treat this patient, are they likely to die? Practitioners commonly use supervised learning algorithms to fit predictive models that help decision-makers reason about likely future outcomes, but we show that this approach is unreliable, and sometimes even dangerous. The key issue is that supervised learning algorithms are highly sensitive to the policy used to choose actions in the training data, which causes the model to capture relationships that do not generalize. We propose using a different learning objective that predicts counterfactuals instead of predicting outcomes under an existing action policy as in supervised learning. To support decision-making in temporal settings, we introduce the Counterfactual Gaussian Process (CGP) to predict the counterfactual future progression of continuous-time trajectories under sequences of future actions. We demonstrate the benefits of the CGP on two important decision-support tasks: risk prediction and "what if?" reasoning for individualized treatment planning. △ Less

Submitted 1 February, 2018; v1 submitted 30 March, 2017; originally announced March 2017.

Comments: Published in the proceedings of Neural Information Processing Systems (NIPS) 2017

arXiv:1608.05182 [pdf, other]

A Bayesian Nonparametric Approach for Estimating Individualized Treatment-Response Curves

Authors: Yanbo Xu, Yanxun Xu, Suchi Saria

Abstract: We study the problem of estimating the continuous response over time to interventions using observational time series---a retrospective dataset where the policy by which the data are generated is unknown to the learner. We are motivated by applications where response varies by individuals and therefore, estimating responses at the individual-level is valuable for personalizing decision-making. We… ▽ More We study the problem of estimating the continuous response over time to interventions using observational time series---a retrospective dataset where the policy by which the data are generated is unknown to the learner. We are motivated by applications where response varies by individuals and therefore, estimating responses at the individual-level is valuable for personalizing decision-making. We refer to this as the problem of estimating individualized treatment response (ITR) curves. In statistics, G-computation formula (Robins, 1986) has been commonly used for estimating treatment responses from observational data containing sequential treatment assignments. However, past studies have focused predominantly on obtaining point-in-time estimates at the population level. We leverage the G-computation formula and develop a novel Bayesian nonparametric (BNP) method that can flexibly model functional data and provide posterior inference over the treatment response curves at both the individual and population level. On a challenging dataset containing time series from patients admitted to a hospital, we estimate responses to treatments used in managing kidney function and show that the resulting fits are more accurate than alternative approaches. Accurate methods for obtaining ITRs from observational data can dramatically accelerate the pace at which personalized treatment plans become possible. △ Less

Submitted 10 December, 2016; v1 submitted 18 August, 2016; originally announced August 2016.

arXiv:1604.05819 [pdf, other]

Trading-Off Cost of Deployment Versus Accuracy in Learning Predictive Models

Authors: Daniel P. Robinson, Suchi Saria

Abstract: Predictive models are finding an increasing number of applications in many industries. As a result, a practical means for trading-off the cost of deploying a model versus its effectiveness is needed. Our work is motivated by risk prediction problems in healthcare. Cost-structures in domains such as healthcare are quite complex, posing a significant challenge to existing approaches. We propose a no… ▽ More Predictive models are finding an increasing number of applications in many industries. As a result, a practical means for trading-off the cost of deploying a model versus its effectiveness is needed. Our work is motivated by risk prediction problems in healthcare. Cost-structures in domains such as healthcare are quite complex, posing a significant challenge to existing approaches. We propose a novel framework for designing cost-sensitive structured regularizers that is suitable for problems with complex cost dependencies. We draw upon a surprising connection to boolean circuits. In particular, we represent the problem costs as a multi-layer boolean circuit, and then use properties of boolean circuits to define an extended feature vector and a group regularizer that exactly captures the underlying cost structure. The resulting regularizer may then be combined with a fidelity function to perform model prediction, for example. For the challenging real-world application of risk prediction for sepsis in intensive care units, the use of our regularizer leads to models that are in harmony with the underlying cost structure and thus provide an excellent prediction accuracy versus cost tradeoff. △ Less

Submitted 20 April, 2016; originally announced April 2016.

Comments: Authors contributed equally to this work. To appear in IJCAI 2016, Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

arXiv:1601.04674 [pdf, other]

A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure

Authors: Peter Schulam, Suchi Saria

Abstract: For many complex diseases, there is a wide variety of ways in which an individual can manifest the disease. The challenge of personalized medicine is to develop tools that can accurately predict the trajectory of an individual's disease, which can in turn enable clinicians to optimize treatments. We represent an individual's disease trajectory as a continuous-valued continuous-time function descri… ▽ More For many complex diseases, there is a wide variety of ways in which an individual can manifest the disease. The challenge of personalized medicine is to develop tools that can accurately predict the trajectory of an individual's disease, which can in turn enable clinicians to optimize treatments. We represent an individual's disease trajectory as a continuous-valued continuous-time function describing the severity of the disease over time. We propose a hierarchical latent variable model that individualizes predictions of disease trajectories. This model shares statistical strength across observations at different resolutions--the population, subpopulation and the individual level. We describe an algorithm for learning population and subpopulation parameters offline, and an online procedure for dynamically learning individual-specific parameters. Finally, we validate our model on the task of predicting the course of interstitial lung disease, a leading cause of death among patients with the autoimmune disease scleroderma. We compare our approach against state-of-the-art and demonstrate significant improvements in predictive accuracy. △ Less

Submitted 21 January, 2016; v1 submitted 18 January, 2016; originally announced January 2016.

Comments: Appeared in Neural Information Processing Systems (NIPS) 2015

arXiv:1507.07295 [pdf, other]

Learning (Predictive) Risk Scores in the Presence of Censoring due to Interventions

Authors: Kirill Dyagilev, Suchi Saria

Abstract: A large and diverse set of measurements are regularly collected during a patient's hospital stay to monitor their health status. Tools for integrating these measurements into severity scores, that accurately track changes in illness severity, can improve clinicians ability to provide timely interventions. Existing approaches for creating such scores either 1) rely on experts to fully specify the s… ▽ More A large and diverse set of measurements are regularly collected during a patient's hospital stay to monitor their health status. Tools for integrating these measurements into severity scores, that accurately track changes in illness severity, can improve clinicians ability to provide timely interventions. Existing approaches for creating such scores either 1) rely on experts to fully specify the severity score, or 2) train a predictive score, using supervised learning, by regressing against a surrogate marker of severity such as the presence of downstream adverse events. The first approach does not extend to diseases where an accurate score cannot be elicited from experts. The second approach often produces scores that suffer from bias due to treatment-related censoring (Paxton, 2013). We propose a novel ranking based framework for disease severity score learning (DSSL). DSSL exploits the following key observation: while it is challenging for experts to quantify the disease severity at any given time, it is often easy to compare the disease severity at two different times. Extending existing ranking algorithms, DSSL learns a function that maps a vector of patient's measurements to a scalar severity score such that the resulting score is temporally smooth and consistent with the expert's ranking of pairs of disease states. We apply DSSL to the problem of learning a sepsis severity score using a large, real-world dataset. The learned scores significantly outperform state-of-the-art clinical scores in ranking patient states by severity and in early detection of future adverse events. We also show that the learned disease severity trajectories are consistent with clinical expectations of disease evolution. Further, using simulated datasets, we show that DSSL exhibits better generalization performance to changes in treatment patterns compared to the above approaches. △ Less

Submitted 26 July, 2015; originally announced July 2015.

Journal ref: Machine Learning Journal, Special Issue on on Machine Learning for Health and Medicine, pp. 1-26, 2015

arXiv:1008.2028 [pdf, ps, other]

Discovering shared and individual latent structure in multiple time series

Authors: Suchi Saria, Daphne Koller, Anna Penn

Abstract: This paper proposes a nonparametric Bayesian method for exploratory data analysis and feature construction in continuous time series. Our method focuses on understanding shared features in a set of time series that exhibit significant individual variability. Our method builds on the framework of latent Diricihlet allocation (LDA) and its extension to hierarchical Dirichlet processes, which allows… ▽ More This paper proposes a nonparametric Bayesian method for exploratory data analysis and feature construction in continuous time series. Our method focuses on understanding shared features in a set of time series that exhibit significant individual variability. Our method builds on the framework of latent Diricihlet allocation (LDA) and its extension to hierarchical Dirichlet processes, which allows us to characterize each series as switching between latent ``topics'', where each topic is characterized as a distribution over ``words'' that specify the series dynamics. However, unlike standard applications of LDA, we discover the words as we learn the model. We apply this model to the task of tracking the physiological signals of premature infants; our model obtains clinically significant insights as well as useful features for supervised learning tasks. △ Less

Submitted 11 August, 2010; originally announced August 2010.

Comments: Additional supplementary section in tex file

Showing 1–24 of 24 results for author: Saria, S