-
Mitigating the Risk of Health Inequity Exacerbated by Large Language Models
Authors:
Yuelyu Ji,
Wenhe Ma,
Sonish Sivarajkumar,
Hang Zhang,
Eugene Mathew Sadhu,
Zhuochun Li,
Xizhi Wu,
Shyam Visweswaran,
Yanshan Wang
Abstract:
Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homeless…
▽ More
Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homelessness, illiteracy, disability, and unemployment into the input of LLMs can lead to incorrect and harmful outputs for these populations. These discrepancies risk exacerbating existing health disparities if LLMs are widely adopted in healthcare. To address this issue, we introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM based medical applications. Our evaluation demonstrates its efficacy in promoting equitable outcomes across diverse populations.
△ Less
Submitted 14 October, 2024; v1 submitted 7 October, 2024;
originally announced October 2024.
-
LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
Authors:
Shantanu Ghosh,
Rayan Syed,
Chenyu Wang,
Vaibhav Choudhary,
Binxu Li,
Clare B. Poynton,
Shyam Visweswaran,
Kayhan Batmanghelich
Abstract:
Error slice discovery is crucial to diagnose and mitigate model errors. Current clustering or discrete attribute-based slice discovery methods face key limitations: 1) clustering results in incoherent slices, while assigning discrete attributes to slices leads to incomplete coverage of error patterns due to missing or insufficient attributes; 2) these methods lack complex reasoning, preventing the…
▽ More
Error slice discovery is crucial to diagnose and mitigate model errors. Current clustering or discrete attribute-based slice discovery methods face key limitations: 1) clustering results in incoherent slices, while assigning discrete attributes to slices leads to incomplete coverage of error patterns due to missing or insufficient attributes; 2) these methods lack complex reasoning, preventing them from fully explaining model biases; 3) they fail to integrate \textit{domain knowledge}, limiting their usage in specialized fields \eg radiology. We propose\ladder (\underline{La}nguage-\underline{D}riven \underline{D}iscovery and \underline{E}rror \underline{R}ectification), to address the limitations by: (1) leveraging the flexibility of natural language to address incompleteness, (2) employing LLM's latent \textit{domain knowledge} and advanced reasoning to analyze sentences and derive testable hypotheses directly, identifying biased attributes, and form coherent error slices without clustering. Existing mitigation methods typically address only the worst-performing group, often amplifying errors in other subgroups. In contrast,\ladder generates pseudo attributes from the discovered hypotheses to mitigate errors across all biases without explicit attribute annotations or prior knowledge of bias. Rigorous evaluations on 6 datasets spanning natural and medical images -- comparing 200+ classifiers with diverse architectures, pretraining strategies, and LLMs -- show that\ladder consistently outperforms existing baselines in discovering and mitigating biases.
△ Less
Submitted 29 May, 2025; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
Authors:
Shantanu Ghosh,
Clare B. Poynton,
Shyam Visweswaran,
Kayhan Batmanghelich
Abstract:
The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP,…
▽ More
The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of screening mammogram-report pairs, addressing the challenges of dataset diversity and size. Our experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes crucial for breast cancer detection, showcasing data efficiency and robustness similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature attribution method, to provide spatial interpretation of representation with sentence-level granularity within mammography reports. Code is available publicly: \url{https://github.com/batmanlab/Mammo-CLIP}.
△ Less
Submitted 22 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Precision Rehabilitation for Patients Post-Stroke based on Electronic Health Records and Machine Learning
Authors:
Fengyi Gao,
Xingyu Zhang,
Sonish Sivarajkumar,
Parker Denny,
Bayan Aldhahwani,
Shyam Visweswaran,
Ryan Shi,
William Hogan,
Allyn Bove,
Yanshan Wang
Abstract:
In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text reha…
▽ More
In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text rehabilitation procedure notes. We collected data for 265 stroke patients from the University of Pittsburgh Medical Center. We employed a pre-existing natural language processing (NLP) algorithm to extract data on rehabilitation exercises and developed a rule-based NLP algorithm to extract Activity Measure for Post-Acute Care (AM-PAC) scores, covering basic mobility (BM) and applied cognitive (AC) domains, from procedure notes. Changes in AM-PAC scores were classified based on the minimal clinically important difference (MCID), and significance was assessed using Friedman and Wilcoxon tests. To identify impactful exercises, we used Chi-square tests, Fisher's exact tests, and logistic regression for odds ratios. Additionally, we developed five machine learning models-logistic regression (LR), Adaboost (ADB), support vector machine (SVM), gradient boosting (GB), and random forest (RF)-to predict outcomes in functional ability. Statistical analyses revealed significant associations between functional improvements and specific exercises. The RF model achieved the best performance in predicting functional outcomes. In this study, we identified three rehabilitation exercises that significantly contributed to patient post-stroke functional ability improvement in the first two months. Additionally, the successful application of a machine learning model to predict patient-specific functional outcomes underscores the potential for precision rehabilitation.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review
Authors:
Thomas Yu Chow Tam,
Sonish Sivarajkumar,
Sumit Kapoor,
Alisa V Stolyar,
Katelyn Polanska,
Karleigh R McCarthy,
Hunter Osterhoudt,
Xizhi Wu,
Shyam Visweswaran,
Sunyang Fu,
Piyush Mathur,
Giovanni E. Cacciamani,
Cong Sun,
Yifan Peng,
Yanshan Wang
Abstract:
With generative artificial intelligence (AI), particularly large language models (LLMs), continuing to make inroads in healthcare, it is critical to supplement traditional automated evaluations with human evaluations. Understanding and evaluating the output of LLMs is essential to assuring safety, reliability, and effectiveness. However, human evaluation's cumbersome, time-consuming, and non-stand…
▽ More
With generative artificial intelligence (AI), particularly large language models (LLMs), continuing to make inroads in healthcare, it is critical to supplement traditional automated evaluations with human evaluations. Understanding and evaluating the output of LLMs is essential to assuring safety, reliability, and effectiveness. However, human evaluation's cumbersome, time-consuming, and non-standardized nature presents significant obstacles to comprehensive evaluation and widespread adoption of LLMs in practice. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare. We highlight a notable need for a standardized and consistent human evaluation approach. Our extensive literature search, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, includes publications from January 2018 to February 2024. The review examines the human evaluation of LLMs across various medical specialties, addressing factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Drawing on the diverse evaluation strategies employed in these studies, we propose a comprehensive and practical framework for human evaluation of LLMs: QUEST: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence. This framework aims to improve the reliability, generalizability, and applicability of human evaluation of LLMs in different healthcare applications by defining clear evaluation dimensions and offering detailed guidelines.
△ Less
Submitted 23 September, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines
Authors:
David Oniani,
Xizhi Wu,
Shyam Visweswaran,
Sumit Kapoor,
Shravan Kooragayalu,
Katelyn Polanska,
Yanshan Wang
Abstract:
Background Large Language Models (LLMs), enhanced with Clinical Practice Guidelines (CPGs), can significantly improve Clinical Decision Support (CDS). However, methods for incorporating CPGs into LLMs are not well studied. Methods We develop three distinct methods for incorporating CPGs into LLMs: Binary Decision Tree (BDT), Program-Aided Graph Construction (PAGC), and Chain-of-Thought-Few-Shot Pr…
▽ More
Background Large Language Models (LLMs), enhanced with Clinical Practice Guidelines (CPGs), can significantly improve Clinical Decision Support (CDS). However, methods for incorporating CPGs into LLMs are not well studied. Methods We develop three distinct methods for incorporating CPGs into LLMs: Binary Decision Tree (BDT), Program-Aided Graph Construction (PAGC), and Chain-of-Thought-Few-Shot Prompting (CoT-FSP). To evaluate the effectiveness of the proposed methods, we create a set of synthetic patient descriptions and conduct both automatic and human evaluation of the responses generated by four LLMs: GPT-4, GPT-3.5 Turbo, LLaMA, and PaLM 2. Zero-Shot Prompting (ZSP) was used as the baseline method. We focus on CDS for COVID-19 outpatient treatment as the case study. Results All four LLMs exhibit improved performance when enhanced with CPGs compared to the baseline ZSP. BDT outperformed both CoT-FSP and PAGC in automatic evaluation. All of the proposed methods demonstrated high performance in human evaluation. Conclusion LLMs enhanced with CPGs demonstrate superior performance, as compared to plain LLMs with ZSP, in providing accurate recommendations for COVID-19 outpatient treatment, which also highlights the potential for broader applications beyond the case study.
△ Less
Submitted 23 January, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images
Authors:
Yanwu Xu,
Li Sun,
Wei Peng,
Shuyue Jia,
Katelyn Morrison,
Adam Perer,
Afrooz Zandifar,
Shyam Visweswaran,
Motahhare Eslami,
Kayhan Batmanghelich
Abstract:
This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by pr…
▽ More
This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. The results of comparative assessments indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines, airways, and vascular structures. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.
△ Less
Submitted 15 October, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing
Authors:
Sonish Sivarajkumar,
Mark Kelley,
Alyssa Samolyk-Mazzanti,
Shyam Visweswaran,
Yanshan Wang
Abstract:
Large language models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), especially in domains where labeled data is scarce or expensive, such as clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is…
▽ More
Large language models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), especially in domains where labeled data is scarce or expensive, such as clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches. In this paper, we present a comprehensive and systematic experimental study on prompt engineering for five clinical NLP tasks: Clinical Sense Disambiguation, Biomedical Evidence Extraction, Coreference Resolution, Medication Status Extraction, and Medication Attribute Extraction. We assessed the prompts proposed in recent literature, including simple prefix, simple cloze, chain of thought, and anticipatory prompts, and introduced two new types of prompts, namely heuristic prompting and ensemble prompting. We evaluated the performance of these prompts on three state-of-the-art LLMs: GPT-3.5, BARD, and LLAMA2. We also contrasted zero-shot prompting with few-shot prompting, and provide novel insights and guidelines for prompt engineering for LLMs in clinical NLP. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative AI, and we hope that it will inspire and inform future research in this area.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Mining Clinical Notes for Physical Rehabilitation Exercise Information: Natural Language Processing Algorithm Development and Validation Study
Authors:
Sonish Sivarajkumar,
Fengyi Gao,
Parker E. Denny,
Bayan M. Aldhahwani,
Shyam Visweswaran,
Allyn Bove,
Yanshan Wang
Abstract:
Post-stroke patient rehabilitation requires precise, personalized treatment plans. Natural Language Processing (NLP) offers potential to extract valuable exercise information from clinical notes, aiding in the development of more effective rehabilitation strategies. Objective: This study aims to develop and evaluate a variety of NLP algorithms to extract and categorize physical rehabilitation exer…
▽ More
Post-stroke patient rehabilitation requires precise, personalized treatment plans. Natural Language Processing (NLP) offers potential to extract valuable exercise information from clinical notes, aiding in the development of more effective rehabilitation strategies. Objective: This study aims to develop and evaluate a variety of NLP algorithms to extract and categorize physical rehabilitation exercise information from the clinical notes of post-stroke patients treated at the University of Pittsburgh Medical Center. A cohort of 13,605 patients diagnosed with stroke was identified, and their clinical notes containing rehabilitation therapy notes were retrieved. A comprehensive clinical ontology was created to represent various aspects of physical rehabilitation exercises. State-of-the-art NLP algorithms were then developed and compared, including rule-based, machine learning-based algorithms, and large language model (LLM)-based algorithms (ChatGPT). Analysis was conducted on a dataset comprising 23,724 notes with detailed demographic and clinical characteristics. The rule-based NLP algorithm demonstrated superior performance in most areas, particularly in detecting the 'Right Side' location with an F1 score of 0.975, outperforming Gradient Boosting by 0.063. Gradient Boosting excelled in 'Lower Extremity' location detection (F1 score: 0.978), surpassing rule-based NLP by 0.023. It also showed notable performance in 'Passive Range of Motion' with an F1 score of 0.970, a 0.032 improvement over rule-based NLP. The rule-based algorithm efficiently handled 'Duration', 'Sets', and 'Reps' with F1 scores up to 0.65. LLM-based NLP, particularly ChatGPT with few-shot prompts, achieved high recall but generally lower precision and F1 scores. However, it notably excelled in 'Backward Plane' motion detection, achieving an F1 score of 0.846, surpassing the rule-based algorithm's 0.720.
△ Less
Submitted 15 March, 2024; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Hyperbolic Molecular Representation Learning for Drug Repositioning
Authors:
Ke Yu,
Shyam Visweswaran,
Kayhan Batmanghelich
Abstract:
Learning accurate drug representations is essential for task such as computational drug repositioning. A drug hierarchy is a valuable source that encodes knowledge of relations among drugs in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not…
▽ More
Learning accurate drug representations is essential for task such as computational drug repositioning. A drug hierarchy is a valuable source that encodes knowledge of relations among drugs in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy. Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from chemical structures of drugs and drug-like molecules (unsupervised), and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the drug-drug similarity information obtained from the hierarchy to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical relations. Our qualitative results support that the learned drug embedding can induce the hierarchical relations among drugs. We demonstrate that the learned drug embedding can be used for drug repositioning.
△ Less
Submitted 6 July, 2022;
originally announced August 2022.
-
Extraction of Sleep Information from Clinical Notes of Patients with Alzheimer's Disease Using Natural Language Processing
Authors:
Sonish Sivarajkumar,
Thomas Yu CHow Tam,
Haneef Ahamed Mohammad,
Samual Viggiano,
David Oniani,
Shyam Visweswaran,
Yanshan Wang
Abstract:
Alzheimer's Disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information…
▽ More
Alzheimer's Disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192,000 de-identified clinical notes of 7,266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based Natural Language Processing (NLP) algorithm, machine learning models, and Large Language Model(LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset. Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of Positive Predictive Value (PPV), rule-based NLP algorithm achieved 1.00 for daytime sleepiness and sleep duration, machine learning models: 0.95 and for napping, 0.86 for bad sleep quality and 0.90 for snoring; and LLAMA2 with finetuning achieved PPV of 0.93 for Night Wakings, 0.89 for sleep problem, and 1.00 for sleep duration. The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD, but could be extended to general sleep information extraction for other diseases.
△ Less
Submitted 15 March, 2024; v1 submitted 8 March, 2022;
originally announced April 2022.
-
Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space
Authors:
Ke Yu,
Shyam Visweswaran,
Kayhan Batmanghelich
Abstract:
Learning accurate drug representation is essential for tasks such as computational drug repositioning and prediction of drug side-effects. A drug hierarchy is a valuable source that encodes human knowledge of drug relations in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in le…
▽ More
Learning accurate drug representation is essential for tasks such as computational drug repositioning and prediction of drug side-effects. A drug hierarchy is a valuable source that encodes human knowledge of drug relations in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy. Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from molecular structures of drugs and drug-like molecules (unsupervised), and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the knowledge-based drug-drug similarity to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical concepts. Both quantitative and qualitative results support that the learned drug embedding can accurately reproduce the chemical structure and induce the hierarchical relations among drugs. Furthermore, our approach can infer the pharmacological properties of novel molecules by retrieving similar drugs from the embedding space. We demonstrate that the learned drug embedding can be used to find new uses for existing drugs and to discover side-effects. We show that it significantly outperforms baselines in both tasks.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Dirac Delta Regression: Conditional Density Estimation with Clinical Trials
Authors:
Eric V. Strobl,
Shyam Visweswaran
Abstract:
Personalized medicine seeks to identify the causal effect of treatment for a particular patient as opposed to a clinical population at large. Most investigators estimate such personalized treatment effects by regressing the outcome of a randomized clinical trial (RCT) on patient covariates. The realized value of the outcome may however lie far from the conditional expectation. We therefore introdu…
▽ More
Personalized medicine seeks to identify the causal effect of treatment for a particular patient as opposed to a clinical population at large. Most investigators estimate such personalized treatment effects by regressing the outcome of a randomized clinical trial (RCT) on patient covariates. The realized value of the outcome may however lie far from the conditional expectation. We therefore introduce a method called Dirac Delta Regression (DDR) that estimates the entire conditional density from RCT data in order to visualize the probabilities across all possible outcome values. DDR transforms the outcome into a set of asymptotically Dirac delta distributions and then estimates the density using non-linear regression. The algorithm can identify significant differences in patient-specific outcomes even when no population level effect exists. Moreover, DDR outperforms state-of-the-art algorithms in conditional density estimation by a large margin even in the small sample regime. An R package is available at https://github.com/ericstrobl/DDR.
△ Less
Submitted 1 September, 2021; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion
Authors:
Eric V. Strobl,
Shyam Visweswaran,
Peter L. Spirtes
Abstract:
Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain on…
▽ More
Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values. In this report, we show that we can more efficiently utilize the observed values with test-wise deletion while still maintaining algorithmic soundness. Here, test-wise deletion refers to the process of list-wise deleting samples only among the variables required for each conditional independence (CI) test used in constraint-based searches. Test-wise deletion therefore often saves more samples than list-wise deletion for each CI test, especially when we have a sparse underlying graph. Our theoretical results show that test-wise deletion is sound under the justifiable assumption that none of the missingness mechanisms causally affect each other in the underlying causal graph. We also find that FCI and RFCI with test-wise deletion outperform their list-wise deletion and imputation counterparts on average when MNAR holds in both synthetic and real data.
△ Less
Submitted 24 May, 2017;
originally announced May 2017.
-
Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery
Authors:
Eric V. Strobl,
Kun Zhang,
Shyam Visweswaran
Abstract:
Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Ran…
▽ More
Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) which both approximate KCIT by utilizing random Fourier features. In practice, both of the proposed tests scale linearly with sample size and return accurate p-values much faster than KCIT in the large sample size context. CCD algorithms run with RCIT or RCoT also return graphs at least as accurate as the same algorithms run with KCIT but with large reductions in run time.
△ Less
Submitted 12 April, 2017; v1 submitted 13 February, 2017;
originally announced February 2017.
-
Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values
Authors:
Eric V. Strobl,
Peter L. Spirtes,
Shyam Visweswaran
Abstract:
The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and co…
▽ More
The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.
△ Less
Submitted 9 May, 2017; v1 submitted 13 July, 2016;
originally announced July 2016.
-
Markov Boundary Discovery with Ridge Regularized Linear Models
Authors:
Eric V. Strobl,
Shyam Visweswaran
Abstract:
Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. However, many investigators are reluctant to draw causal interpretations of the selected variables due to the incomplete knowledge o…
▽ More
Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. However, many investigators are reluctant to draw causal interpretations of the selected variables due to the incomplete knowledge of the capabilities of RRLMs in causal inference. Under reasonable assumptions, we show that a modified form of RRLMs can get very close to identifying a subset of the Markov boundary by providing a worst-case bound on the space of possible solutions. The results hold for any convex loss, even when the underlying functional relationship is nonlinear, and the solution is not unique. Our approach combines ideas in Markov boundary and sufficient dimension reduction theory. Experimental results show that the modified RRLMs are competitive against state-of-the-art algorithms in discovering part of the Markov boundary from gene expression data.
△ Less
Submitted 13 September, 2015;
originally announced September 2015.
-
Dependence versus Conditional Dependence in Local Causal Discovery from Gene Expression Data
Authors:
Eric V. Strobl,
Shyam Visweswaran
Abstract:
Motivation: Algorithms that discover variables which are causally related to a target may inform the design of experiments. With observational gene expression data, many methods discover causal variables by measuring each variable's degree of statistical dependence with the target using dependence measures (DMs). However, other methods measure each variable's ability to explain the statistical dep…
▽ More
Motivation: Algorithms that discover variables which are causally related to a target may inform the design of experiments. With observational gene expression data, many methods discover causal variables by measuring each variable's degree of statistical dependence with the target using dependence measures (DMs). However, other methods measure each variable's ability to explain the statistical dependence between the target and the remaining variables in the data using conditional dependence measures (CDMs), since this strategy is guaranteed to find the target's direct causes, direct effects, and direct causes of the direct effects in the infinite sample limit. In this paper, we design a new algorithm in order to systematically compare the relative abilities of DMs and CDMs in discovering causal variables from gene expression data.
Results: The proposed algorithm using a CDM is sample efficient, since it consistently outperforms other state-of-the-art local causal discovery algorithms when samples sizes are small. However, the proposed algorithm using a CDM outperforms the proposed algorithm using a DM only when sample sizes are above several hundred. These results suggest that accurate causal discovery from gene expression data using current CDM-based algorithms requires datasets with at least several hundred samples.
Availability: The proposed algorithm is freely available at https://github.com/ericstrobl/DvCD.
△ Less
Submitted 28 July, 2014;
originally announced July 2014.
-
Counting Markov Blanket Structures
Authors:
Shyam Visweswaran,
Gregory F. Cooper
Abstract:
Learning Markov blanket (MB) structures has proven useful in performing feature selection, learning Bayesian networks (BNs), and discovering causal relationships. We present a formula for efficiently determining the number of MB structures given a target variable and a set of other variables. As expected, the number of MB structures grows exponentially. However, we show quantitatively that there a…
▽ More
Learning Markov blanket (MB) structures has proven useful in performing feature selection, learning Bayesian networks (BNs), and discovering causal relationships. We present a formula for efficiently determining the number of MB structures given a target variable and a set of other variables. As expected, the number of MB structures grows exponentially. However, we show quantitatively that there are many fewer MB structures that contain the target variable than there are BN structures that contain it. In particular, the ratio of BN structures to MB structures appears to increase exponentially in the number of variables.
△ Less
Submitted 12 July, 2014; v1 submitted 9 July, 2014;
originally announced July 2014.
-
Markov Blanket Ranking using Kernel-based Conditional Dependence Measures
Authors:
Eric V. Strobl,
Shyam Visweswaran
Abstract:
Developing feature selection algorithms that move beyond a pure correlational to a more causal analysis of observational data is an important problem in the sciences. Several algorithms attempt to do so by discovering the Markov blanket of a target, but they all contain a forward selection step which variables must pass in order to be included in the conditioning set. As a result, these algorithms…
▽ More
Developing feature selection algorithms that move beyond a pure correlational to a more causal analysis of observational data is an important problem in the sciences. Several algorithms attempt to do so by discovering the Markov blanket of a target, but they all contain a forward selection step which variables must pass in order to be included in the conditioning set. As a result, these algorithms may not consider all possible conditional multivariate combinations. We improve on this limitation by proposing a backward elimination method that uses a kernel-based conditional dependence measure to identify the Markov blanket in a fully multivariate fashion. The algorithm is easy to implement and compares favorably to other methods on synthetic and real datasets.
△ Less
Submitted 2 May, 2014; v1 submitted 1 February, 2014;
originally announced February 2014.
-
Deep Multiple Kernel Learning
Authors:
Eric Strobl,
Shyam Visweswaran
Abstract:
Deep learning methods have predominantly been applied to large artificial neural networks. Despite their state-of-the-art performance, these large networks typically do not generalize well to datasets with limited sample sizes. In this paper, we take a different approach by learning multiple layers of kernels. We combine kernels at each layer and then optimize over an estimate of the support vecto…
▽ More
Deep learning methods have predominantly been applied to large artificial neural networks. Despite their state-of-the-art performance, these large networks typically do not generalize well to datasets with limited sample sizes. In this paper, we take a different approach by learning multiple layers of kernels. We combine kernels at each layer and then optimize over an estimate of the support vector machine leave-one-out error rather than the dual objective function. Our experiments on a variety of datasets show that each layer successively increases performance with only a few base kernels.
△ Less
Submitted 11 October, 2013;
originally announced October 2013.