-
Removing Spurious Correlation from Neural Network Interpretations
Authors:
Milad Fotouhi,
Mohammad Taha Bahadori,
Oluwaseyi Feyisetan,
Payman Arabshahi,
David Heckerman
Abstract:
The existing algorithms for identification of neurons responsible for undesired and harmful behaviors do not consider the effects of confounders such as topic of the conversation. In this work, we show that confounders can create spurious correlations and propose a new causal mediation approach that controls the impact of the topic. In experiments with two large language models, we study the local…
▽ More
The existing algorithms for identification of neurons responsible for undesired and harmful behaviors do not consider the effects of confounders such as topic of the conversation. In this work, we show that confounders can create spurious correlations and propose a new causal mediation approach that controls the impact of the topic. In experiments with two large language models, we study the localization hypothesis and show that adjusting for the effect of conversation topic, toxicity becomes less localized.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Fast Training Dataset Attribution via In-Context Learning
Authors:
Milad Fotouhi,
Mohammad Taha Bahadori,
Oluwaseyi Feyisetan,
Payman Arabshahi,
David Heckerman
Abstract:
We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem…
▽ More
We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.
△ Less
Submitted 18 March, 2025; v1 submitted 14 August, 2024;
originally announced August 2024.
-
Multiply-Robust Causal Change Attribution
Authors:
Victor Quintas-Martinez,
Mohammad Taha Bahadori,
Eduardo Santiago,
Jeff Mu,
Dominik Janzing,
David Heckerman
Abstract:
Comparing two samples of data, we observe a change in the distribution of an outcome variable. In the presence of multiple explanatory variables, how much of the change can be explained by each possible cause? We develop a new estimation strategy that, given a causal model, combines regression and re-weighting methods to quantify the contribution of each causal mechanism. Our proposed methodology…
▽ More
Comparing two samples of data, we observe a change in the distribution of an outcome variable. In the presence of multiple explanatory variables, how much of the change can be explained by each possible cause? We develop a new estimation strategy that, given a causal model, combines regression and re-weighting methods to quantify the contribution of each causal mechanism. Our proposed methodology is multiply robust, meaning that it still recovers the target parameter under partial misspecification. We prove that our estimator is consistent and asymptotically normal. Moreover, it can be incorporated into existing frameworks for causal attribution, such as Shapley values, which will inherit the consistency and large-sample distribution properties. Our method demonstrates excellent performance in Monte Carlo simulations, and we show its usefulness in an empirical application. Our method is implemented as part of the Python library DoWhy (arXiv:2011.04216, arXiv:2206.06821).
△ Less
Submitted 5 September, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
End-to-End Balancing for Causal Continuous Treatment-Effect Estimation
Authors:
Mohammad Taha Bahadori,
Eric Tchetgen Tchetgen,
David E. Heckerman
Abstract:
We study the problem of observational causal inference with continuous treatments in the framework of inverse propensity-score weighting. To obtain stable weights, we design a new algorithm based on entropy balancing that learns weights to directly maximize causal inference accuracy using end-to-end optimization. In the process of optimization, these weights are automatically tuned to the specific…
▽ More
We study the problem of observational causal inference with continuous treatments in the framework of inverse propensity-score weighting. To obtain stable weights, we design a new algorithm based on entropy balancing that learns weights to directly maximize causal inference accuracy using end-to-end optimization. In the process of optimization, these weights are automatically tuned to the specific dataset and causal inference algorithm being used. We provide a theoretical analysis demonstrating consistency of our approach. Using synthetic and real-world data, we show that our algorithm estimates causal effect more accurately than baseline entropy balancing.
△ Less
Submitted 10 July, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Debiasing Concept-based Explanations with Causal Analysis
Authors:
Mohammad Taha Bahadori,
David E. Heckerman
Abstract:
Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables a…
▽ More
Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using a two-stage regression technique borrowed from the instrumental variable literature. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.
△ Less
Submitted 22 May, 2021; v1 submitted 22 July, 2020;
originally announced July 2020.
-
Discovering Invariances in Healthcare Neural Networks
Authors:
Mohammad Taha Bahadori,
Layne C. Price
Abstract:
We study the invariance characteristics of pre-trained predictive models by empirically learning transformations on the input that leave the prediction function approximately unchanged. To learn invariant transformations, we minimize the Wasserstein distance between the predictive distribution conditioned on the data instances and the predictive distribution conditioned on the transformed data ins…
▽ More
We study the invariance characteristics of pre-trained predictive models by empirically learning transformations on the input that leave the prediction function approximately unchanged. To learn invariant transformations, we minimize the Wasserstein distance between the predictive distribution conditioned on the data instances and the predictive distribution conditioned on the transformed data instances. To avoid finding degenerate or perturbative transformations, we add a similarity regularization to discourage similarity between the data and its transformed values. We theoretically analyze the correctness of the algorithm and the structure of the solutions. Applying the proposed technique to clinical time series data, we discover variables that commonly-used LSTM models do not rely on for their prediction, especially when the LSTM is trained to be adversarially robust. We also analyze the invariances of BioBERT on clinical notes and discover words that it is invariant to.
△ Less
Submitted 3 March, 2020; v1 submitted 8 November, 2019;
originally announced November 2019.
-
Temporal-Clustering Invariance in Irregular Healthcare Time Series
Authors:
Mohammad Taha Bahadori,
Zachary Chase Lipton
Abstract:
Electronic records contain sequences of events, some of which take place all at once in a single visit, and others that are dispersed over multiple visits, each with a different timestamp. We postulate that fine temporal detail, e.g., whether a series of blood tests are completed at once or in rapid succession should not alter predictions based on this data. Motivated by this intuition, we propose…
▽ More
Electronic records contain sequences of events, some of which take place all at once in a single visit, and others that are dispersed over multiple visits, each with a different timestamp. We postulate that fine temporal detail, e.g., whether a series of blood tests are completed at once or in rapid succession should not alter predictions based on this data. Motivated by this intuition, we propose models for analyzing sequences of multivariate clinical time series data that are invariant to this temporal clustering. We propose an efficient data augmentation technique that exploits the postulated temporal-clustering invariance to regularize deep neural networks optimized for several clinical prediction tasks. We introduce two techniques to temporally coarsen (downsample) irregular time series: (i) grouping the data points based on regularly-spaced timestamps; and (ii) clustering them, yielding irregularly-paced timestamps. Moreover, we propose a MultiResolution Ensemble (MRE) model, improving predictive accuracy by ensembling predictions based on inputs sequences transformed by different coarsening operators. Our experiments show that MRE improves the mAP on the benchmark mortality prediction task from 51.53% to 53.92%.
△ Less
Submitted 27 April, 2019;
originally announced April 2019.
-
Improving Hospital Mortality Prediction with Medical Named Entities and Multimodal Learning
Authors:
Mengqi Jin,
Mohammad Taha Bahadori,
Aaron Colak,
Parminder Bhatia,
Busra Celikkaya,
Ram Bhakta,
Selvan Senthivel,
Mohammed Khalilia,
Daniel Navarro,
Borui Zhang,
Tiberiu Doman,
Arun Ravi,
Matthieu Liger,
Taha Kass-hout
Abstract:
Clinical text provides essential information to estimate the acuity of a patient during hospital stays in addition to structured clinical data. In this study, we explore how clinical text can complement a clinical predictive learning task. We leverage an internal medical natural language processing service to perform named entity extraction and negation detection on clinical notes and compose sele…
▽ More
Clinical text provides essential information to estimate the acuity of a patient during hospital stays in addition to structured clinical data. In this study, we explore how clinical text can complement a clinical predictive learning task. We leverage an internal medical natural language processing service to perform named entity extraction and negation detection on clinical notes and compose selected entities into a new text corpus to train document representations. We then propose a multimodal neural network to jointly train time series signals and unstructured clinical text representations to predict the in-hospital mortality risk for ICU patients. Our model outperforms the benchmark by 2% AUC.
△ Less
Submitted 3 December, 2018; v1 submitted 29 November, 2018;
originally announced November 2018.
-
Causal Regularization
Authors:
Mohammad Taha Bahadori,
Krzysztof Chalupka,
Edward Choi,
Robert Chen,
Walter F. Stewart,
Jimeng Sun
Abstract:
In application domains such as healthcare, we want accurate predictive models that are also causally interpretable. In pursuit of such models, we propose a causal regularizer to steer predictive models towards causally-interpretable solutions and theoretically study its properties. In a large-scale analysis of Electronic Health Records (EHR), our causally-regularized model outperforms its L1-regul…
▽ More
In application domains such as healthcare, we want accurate predictive models that are also causally interpretable. In pursuit of such models, we propose a causal regularizer to steer predictive models towards causally-interpretable solutions and theoretically study its properties. In a large-scale analysis of Electronic Health Records (EHR), our causally-regularized model outperforms its L1-regularized counterpart in causal accuracy and is competitive in predictive performance. We perform non-linear causality analysis by causally regularizing a special neural network architecture. We also show that the proposed causal regularizer can be used together with neural representation learning algorithms to yield up to 20% improvement over multilayer perceptron in detecting multivariate causation, a situation common in healthcare, where many causal factors should occur simultaneously to have an effect on the target variable.
△ Less
Submitted 23 February, 2017; v1 submitted 8 February, 2017;
originally announced February 2017.
-
GRAM: Graph-based Attention Model for Healthcare Representation Learning
Authors:
Edward Choi,
Mohammad Taha Bahadori,
Le Song,
Walter F. Stewart,
Jimeng Sun
Abstract:
Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: -Data insufficiency:Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results. -Interpretation:The representations learned by deep learning methods should align with medical knowledge. To address the…
▽ More
Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: -Data insufficiency:Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results. -Interpretation:The representations learned by deep learning methods should align with medical knowledge. To address these challenges, we propose a GRaph-based Attention Model, GRAM that supplements electronic health records (EHR) with hierarchical information inherent to medical ontologies. Based on the data volume and the ontology structure, GRAM represents a medical concept as a combination of its ancestors in the ontology via an attention mechanism. We compared predictive performance (i.e. accuracy, data needs, interpretability) of GRAM to various methods including the recurrent neural network (RNN) in two sequential diagnoses prediction tasks and one heart failure prediction task. Compared to the basic RNN, GRAM achieved 10% higher accuracy for predicting diseases rarely observed in the training data and 3% improved area under the ROC curve for predicting heart failure using an order of magnitude less training data. Additionally, unlike other methods, the medical concept representations learned by GRAM are well aligned with the medical ontology. Finally, GRAM exhibits intuitive attention behaviors by adaptively generalizing to higher level concepts when facing data insufficiency at the lower level concepts.
△ Less
Submitted 1 April, 2017; v1 submitted 21 November, 2016;
originally announced November 2016.
-
RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism
Authors:
Edward Choi,
Mohammad Taha Bahadori,
Joshua A. Kulas,
Andy Schuetz,
Walter F. Stewart,
Jimeng Sun
Abstract:
Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are importan…
▽ More
Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models.
△ Less
Submitted 26 February, 2017; v1 submitted 19 August, 2016;
originally announced August 2016.
-
Scalable Interpretable Multi-Response Regression via SEED
Authors:
Mohammad Taha Bahadori,
Zemin Zheng,
Yan Liu,
Jinchi Lv
Abstract:
Sparse reduced-rank regression is an important tool to uncover meaningful dependence structure between large numbers of predictors and responses in many big data applications such as genome-wide association studies and social media analysis. Despite the recent theoretical and algorithmic advances, scalable estimation of sparse reduced-rank regression remains largely unexplored. In this paper, we s…
▽ More
Sparse reduced-rank regression is an important tool to uncover meaningful dependence structure between large numbers of predictors and responses in many big data applications such as genome-wide association studies and social media analysis. Despite the recent theoretical and algorithmic advances, scalable estimation of sparse reduced-rank regression remains largely unexplored. In this paper, we suggest a scalable procedure called sequential estimation with eigen-decomposition (SEED) which needs only a single top-$r$ singular value decomposition to find the optimal low-rank and sparse matrix by solving a sparse generalized eigenvalue problem. Our suggested method is not only scalable but also performs simultaneous dimensionality reduction and variable selection. Under some mild regularity conditions, we show that SEED enjoys nice sampling properties including consistency in estimation, rank selection, prediction, and model selection. Numerical studies on synthetic and real data sets show that SEED outperforms the state-of-the-art approaches for large-scale matrix estimation problem.
△ Less
Submitted 12 August, 2016;
originally announced August 2016.
-
FLASH: Fast Bayesian Optimization for Data Analytic Pipelines
Authors:
Yuyu Zhang,
Mohammad Taha Bahadori,
Hang Su,
Jimeng Sun
Abstract:
Modern data science relies on data analytic pipelines to organize interdependent computational steps. Such analytic pipelines often involve different algorithms across multiple steps, each with its own hyperparameters. To achieve the best performance, it is often critical to select optimal algorithms and to set appropriate hyperparameters, which requires large computational efforts. Bayesian optim…
▽ More
Modern data science relies on data analytic pipelines to organize interdependent computational steps. Such analytic pipelines often involve different algorithms across multiple steps, each with its own hyperparameters. To achieve the best performance, it is often critical to select optimal algorithms and to set appropriate hyperparameters, which requires large computational efforts. Bayesian optimization provides a principled way for searching optimal hyperparameters for a single algorithm. However, many challenges remain in solving pipeline optimization problems with high-dimensional and highly conditional search space. In this work, we propose Fast LineAr SearcH (FLASH), an efficient method for tuning analytic pipelines. FLASH is a two-layer Bayesian optimization framework, which firstly uses a parametric model to select promising algorithms, then computes a nonparametric model to fine-tune hyperparameters of the promising algorithms. FLASH also includes an effective caching algorithm which can further accelerate the search process. Extensive experiments on a number of benchmark datasets have demonstrated that FLASH significantly outperforms previous state-of-the-art methods in both search speed and accuracy. Using 50% of the time budget, FLASH achieves up to 20% improvement on test error rate compared to the baselines. FLASH also yields state-of-the-art performance on a real-world application for healthcare predictive modeling.
△ Less
Submitted 23 June, 2016; v1 submitted 20 February, 2016;
originally announced February 2016.
-
Multi-layer Representation Learning for Medical Concepts
Authors:
Edward Choi,
Mohammad Taha Bahadori,
Elizabeth Searles,
Catherine Coffey,
Jimeng Sun
Abstract:
Learning efficient representations for concepts has been proven to be an important basis for many applications such as machine translation or document classification. Proper representations of medical concepts such as diagnosis, medication, procedure codes and visits will have broad applications in healthcare analytics. However, in Electronic Health Records (EHR) the visit sequences of patients in…
▽ More
Learning efficient representations for concepts has been proven to be an important basis for many applications such as machine translation or document classification. Proper representations of medical concepts such as diagnosis, medication, procedure codes and visits will have broad applications in healthcare analytics. However, in Electronic Health Records (EHR) the visit sequences of patients include multiple concepts (diagnosis, procedure, and medication codes) per visit. This structure provides two types of relational information, namely sequential order of visits and co-occurrence of the codes within each visit. In this work, we propose Med2Vec, which not only learns distributed representations for both medical codes and visits from a large EHR dataset with over 3 million visits, but also allows us to interpret the learned representations confirmed positively by clinical experts. In the experiments, Med2Vec displays significant improvement in key medical applications compared to popular baselines such as Skip-gram, GloVe and stacked autoencoder, while providing clinically meaningful interpretation.
△ Less
Submitted 17 February, 2016;
originally announced February 2016.
-
Doctor AI: Predicting Clinical Events via Recurrent Neural Networks
Authors:
Edward Choi,
Mohammad Taha Bahadori,
Andy Schuetz,
Walter F. Stewart,
Jimeng Sun
Abstract:
Leveraging large historical data in electronic health record (EHR), we developed Doctor AI, a generic predictive model that covers observed medical conditions and medication uses. Doctor AI is a temporal model using recurrent neural networks (RNN) and was developed and applied to longitudinal time stamped EHR data from 260K patients over 8 years. Encounter records (e.g. diagnosis codes, medication…
▽ More
Leveraging large historical data in electronic health record (EHR), we developed Doctor AI, a generic predictive model that covers observed medical conditions and medication uses. Doctor AI is a temporal model using recurrent neural networks (RNN) and was developed and applied to longitudinal time stamped EHR data from 260K patients over 8 years. Encounter records (e.g. diagnosis codes, medication codes or procedure codes) were input to RNN to predict (all) the diagnosis and medication categories for a subsequent visit. Doctor AI assesses the history of patients to make multilabel predictions (one label for each diagnosis or medication category). Based on separate blind test set evaluation, Doctor AI can perform differential diagnosis with up to 79% recall@30, significantly higher than several baselines. Moreover, we demonstrate great generalizability of Doctor AI by adapting the resulting models from one institution to another without losing substantial accuracy.
△ Less
Submitted 28 September, 2016; v1 submitted 18 November, 2015;
originally announced November 2015.