-
LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs
Authors:
Benjamin J. Lengerich,
Sebastian Bordt,
Harsha Nori,
Mark E. Nunnally,
Yin Aphinyanaphongs,
Manolis Kellis,
Rich Caruana
Abstract:
We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive backgroun…
▽ More
We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package $\texttt{TalkToEBM}$ as an open-source LLM-GAM interface.
△ Less
Submitted 7 August, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Observational Causal Inference in Novel Diseases: A Case Study of COVID-19
Authors:
Alexander Peysakhovich,
Yin Aphinyanaphongs
Abstract:
A key issue for all observational causal inference is that it relies on an unverifiable assumption - that observed characteristics are sufficient to proxy for treatment confounding. In this paper we argue that in medical cases these conditions are more likely to be met in cases where standardized treatment guidelines do not yet exist. One example of such a situation is the emergence of a novel dis…
▽ More
A key issue for all observational causal inference is that it relies on an unverifiable assumption - that observed characteristics are sufficient to proxy for treatment confounding. In this paper we argue that in medical cases these conditions are more likely to be met in cases where standardized treatment guidelines do not yet exist. One example of such a situation is the emergence of a novel disease. We study the case of early COVID-19 in New York City hospitals and show that observational analysis of two important thereapeutics, anti-coagulation and steroid therapy, gives results that agree with later guidelines issued via combinations of randomized trials and other evidence. We also argue that observational causal inference cannot be applied mechanically and requires domain expertise by the analyst by showing a cautionary tale of a treatment that appears extremely promising in the data, but the result is due to a quirk of hospital policy.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations
Authors:
Neil Jethani,
Mukund Sudarshan,
Yindalon Aphinyanaphongs,
Rajesh Ranganath
Abstract:
While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as ev…
▽ More
While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research
Authors:
Vincent Major,
Alisa Surkis,
Yindalon Aphinyanaphongs
Abstract:
Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an entirely unsupervised manner using a contextual window and doing so much faster than previous methods. Each word is projected into vector space such that similar…
▽ More
Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an entirely unsupervised manner using a contextual window and doing so much faster than previous methods. Each word is projected into vector space such that similar meaning words such as "strong" and "powerful" are projected into the same general Euclidean space. Open questions about these embeddings include their utility across classification tasks and the optimal properties and source of documents to construct broadly functional embeddings. In this work, we demonstrate the usefulness of pre-trained embeddings for classification in our task and demonstrate that custom word embeddings, built in the domain and for the tasks, can improve performance over word embeddings learnt on more general data including news articles or Wikipedia.
△ Less
Submitted 9 July, 2018; v1 submitted 17 May, 2017;
originally announced May 2017.
-
A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations
Authors:
Josua Krause,
Aritra Dasgupta,
Jordan Swartz,
Yindalon Aphinyanaphongs,
Enrico Bertini
Abstract:
Human-in-the-loop data analysis applications necessitate greater transparency in machine learning models for experts to understand and trust their decisions. To this end, we propose a visual analytics workflow to help data scientists and domain experts explore, diagnose, and understand the decisions made by a binary classifier. The approach leverages "instance-level explanations", measures of loca…
▽ More
Human-in-the-loop data analysis applications necessitate greater transparency in machine learning models for experts to understand and trust their decisions. To this end, we propose a visual analytics workflow to help data scientists and domain experts explore, diagnose, and understand the decisions made by a binary classifier. The approach leverages "instance-level explanations", measures of local feature relevance that explain single instances, and uses them to build a set of visual representations that guide the users in their investigation. The workflow is based on three main visual representations and steps: one based on aggregate statistics to see how data distributes across correct / incorrect decisions; one based on explanations to understand which features are used to make these decisions; and one based on raw data, to derive insights on potential root causes for the observed patterns. The workflow is derived from a long-term collaboration with a group of machine learning and healthcare professionals who used our method to make sense of machine learning models they developed. The case study from this collaboration demonstrates that the proposed workflow helps experts derive useful knowledge about the model and the phenomena it describes, thus experts can generate useful hypotheses on how a model can be improved.
△ Less
Submitted 1 October, 2017; v1 submitted 4 May, 2017;
originally announced May 2017.