Search | arXiv e-print repository

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

Authors: Zihao Lin, Samyadeep Basu, Mohammad Beigi, Varun Manjunatha, Ryan A. Rossi, Zichao Wang, Yufan Zhou, Sriram Balasubramanian, Arman Zarei, Keivan Rezaei, Ying Shen, Barry Menglong Yao, Zhiyang Xu, Qin Liu, Yuxiang Zhang, Yan Sun, Shilong Liu, Li Shen, Hongxuan Li, Soheil Feizi, Lifu Huang

Abstract: The rise of foundation models has transformed machine learning research, prompting efforts to uncover their inner workings and develop more efficient and reliable applications for better control. While significant progress has been made in interpreting Large Language Models (LLMs), multimodal foundation models (MMFMs) - such as contrastive vision-language models, generative vision-language models,… ▽ More The rise of foundation models has transformed machine learning research, prompting efforts to uncover their inner workings and develop more efficient and reliable applications for better control. While significant progress has been made in interpreting Large Language Models (LLMs), multimodal foundation models (MMFMs) - such as contrastive vision-language models, generative vision-language models, and text-to-image models - pose unique interpretability challenges beyond unimodal frameworks. Despite initial studies, a substantial gap remains between the interpretability of LLMs and MMFMs. This survey explores two key aspects: (1) the adaptation of LLM interpretability methods to multimodal models and (2) understanding the mechanistic differences between unimodal language models and crossmodal systems. By systematically reviewing current MMFM analysis techniques, we propose a structured taxonomy of interpretability methods, compare insights across unimodal and multimodal architectures, and highlight critical research gaps. △ Less

Submitted 22 February, 2025; originally announced February 2025.

Comments: 30 pages, 4 Figures, 10 Tables

arXiv:2411.07317 [pdf, other]

SynRL: Aligning Synthetic Clinical Trial Data with Human-preferred Clinical Endpoints Using Reinforcement Learning

Authors: Trisha Das, Zifeng Wang, Afrah Shafquat, Mandis Beigi, Jason Mezey, Jacob Aptekar, Jimeng Sun

Abstract: Each year, hundreds of clinical trials are conducted to evaluate new medical interventions, but sharing patient records from these trials with other institutions can be challenging due to privacy concerns and federal regulations. To help mitigate privacy concerns, researchers have proposed methods for generating synthetic patient data. However, existing approaches for generating synthetic clinical… ▽ More Each year, hundreds of clinical trials are conducted to evaluate new medical interventions, but sharing patient records from these trials with other institutions can be challenging due to privacy concerns and federal regulations. To help mitigate privacy concerns, researchers have proposed methods for generating synthetic patient data. However, existing approaches for generating synthetic clinical trial data disregard the usage requirements of these data, including maintaining specific properties of clinical outcomes, and only use post hoc assessments that are not coupled with the data generation process. In this paper, we propose SynRL which leverages reinforcement learning to improve the performance of patient data generators by customizing the generated data to meet the user-specified requirements for synthetic data outcomes and endpoints. Our method includes a data value critic function to evaluate the quality of the generated data and uses reinforcement learning to align the data generator with the users' needs based on the critic's feedback. We performed experiments on four clinical trial datasets and demonstrated the advantages of SynRL in improving the quality of the generated synthetic data while keeping the privacy risks low. We also show that SynRL can be utilized as a general framework that can customize data generation of multiple types of synthetic data generators. Our code is available at https://anonymous.4open.science/r/SynRL-DB0F/. △ Less

Submitted 17 February, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

arXiv:2410.20199 [pdf, other]

Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models

Authors: Mohammad Beigi, Sijia Wang, Ying Shen, Zihao Lin, Adithya Kulkarni, Jianfeng He, Feng Chen, Ming Jin, Jin-Hee Cho, Dawei Zhou, Chang-Tien Lu, Lifu Huang

Abstract: In recent years, Large Language Models (LLMs) have become fundamental to a broad spectrum of artificial intelligence applications. As the use of LLMs expands, precisely estimating the uncertainty in their predictions has become crucial. Current methods often struggle to accurately identify, measure, and address the true uncertainty, with many focusing primarily on estimating model confidence. This… ▽ More In recent years, Large Language Models (LLMs) have become fundamental to a broad spectrum of artificial intelligence applications. As the use of LLMs expands, precisely estimating the uncertainty in their predictions has become crucial. Current methods often struggle to accurately identify, measure, and address the true uncertainty, with many focusing primarily on estimating model confidence. This discrepancy is largely due to an incomplete understanding of where, when, and how uncertainties are injected into models. This paper introduces a comprehensive framework specifically designed to identify and understand the types and sources of uncertainty, aligned with the unique characteristics of LLMs. Our framework enhances the understanding of the diverse landscape of uncertainties by systematically categorizing and defining each type, establishing a solid foundation for developing targeted methods that can precisely quantify these uncertainties. We also provide a detailed introduction to key related concepts and examine the limitations of current methods in mission-critical and safety-sensitive applications. The paper concludes with a perspective on future directions aimed at enhancing the reliability and practical adoption of these methods in real-world scenarios. △ Less

Submitted 26 October, 2024; originally announced October 2024.

arXiv:2409.07089 [pdf, other]

TrialSynth: Generation of Synthetic Sequential Clinical Trial Data

Authors: Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun

Abstract: Analyzing data from past clinical trials is part of the ongoing effort to optimize the design, implementation, and execution of new clinical trials and more efficiently bring life-saving interventions to market. While there have been recent advances in the generation of static context synthetic clinical trial data, due to both limited patient availability and constraints imposed by patient privacy… ▽ More Analyzing data from past clinical trials is part of the ongoing effort to optimize the design, implementation, and execution of new clinical trials and more efficiently bring life-saving interventions to market. While there have been recent advances in the generation of static context synthetic clinical trial data, due to both limited patient availability and constraints imposed by patient privacy needs, the generation of fine-grained synthetic time-sequential clinical trial data has been challenging. Given that patient trajectories over an entire clinical trial are of high importance for optimizing trial design and efforts to prevent harmful adverse events, there is a significant need for the generation of high-fidelity time-sequence clinical trial data. Here we introduce TrialSynth, a Variational Autoencoder (VAE) designed to address the specific challenges of generating synthetic time-sequence clinical trial data. Distinct from related clinical data VAE methods, the core of our method leverages Hawkes Processes (HP), which are particularly well-suited for modeling event-type and time gap prediction needed to capture the structure of sequential clinical trial data. Our experiments demonstrate that TrialSynth surpasses the performance of other comparable methods that can generate sequential clinical trial data at varying levels of fidelity / privacy tradeoff, enabling the generation of highly accurate event sequences across multiple real-world sequential event datasets with small patient source populations. Notably, our empirical findings highlight that TrialSynth not only outperforms existing clinical sequence-generating methods but also produces data with superior utility while empirically preserving patient privacy. △ Less

Submitted 12 December, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

arXiv:2408.15302 [pdf]

Corrigendum to: A Systematic Study of DDR4 DRAM Faults in the Field

Authors: Majed Valad Beigi, Yi Cao, Sudhanva Gurumurthi, Charles Recchia, Andrew Walton, Vilas Sridharan

Abstract: This paper is a corrigendum to the paper by Beigi et al. published at HPCA 2023 https://doi.org/10.1109/HPCA56546.2023.10071066. The HPCA paper presented a detailed field data analysis of faults observed at scale in DDR4 DRAM from two different memory vendors. This analysis included a breakdown of fault patterns or modes. Upon further study of the data, we found a bug in how we decoded errors base… ▽ More This paper is a corrigendum to the paper by Beigi et al. published at HPCA 2023 https://doi.org/10.1109/HPCA56546.2023.10071066. The HPCA paper presented a detailed field data analysis of faults observed at scale in DDR4 DRAM from two different memory vendors. This analysis included a breakdown of fault patterns or modes. Upon further study of the data, we found a bug in how we decoded errors based on the logged row-bank-column address. Specifically, we found that some errors that occurred in one column were mis-interpreted as occurring in two non-adjacent columns. As a result of this, some single-bit faults were misclassified as partial-row faults (i.e., two-bit faults). Similarly, some single-column faults were misclassified as two-column faults. The result of these misclassification errors is that the proportion of single-bit faults is higher than reported in the paper, with a commensurate reduction in the fraction of certain types of multi-bit faults. These misclassifications also slightly change the Failure In Time (FIT) per DRAM device values presented in the original paper. In this corrigendum, we provide an updated version of the relevant tables and figures and point out the corresponding page numbers and references in the original paper that they replace. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2406.12053 [pdf, other]

InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

Authors: Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

Abstract: Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention st… ▽ More Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention states, feed-forward states, and activation states of all layers. Unlike existing methods that primarily focus on the final activation state, InternalInspector conducts a comprehensive analysis across all internal states of every layer to accurately identify both correct and incorrect prediction processes. By benchmarking InternalInspector against existing confidence estimation methods across various natural language understanding and generation tasks, including factual question answering, commonsense reasoning, and reading comprehension, InternalInspector achieves significantly higher accuracy in aligning the estimated confidence scores with the correctness of the LLM's predictions and lower calibration error. Furthermore, InternalInspector excels at HaluEval, a hallucination detection benchmark, outperforming other internal-based confidence estimation methods in this task. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2402.11122 [pdf, other]

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models

Authors: Zihao Lin, Mohammad Beigi, Hongxuan Li, Yufan Zhou, Yuxiang Zhang, Qifan Wang, Wenpeng Yin, Lifu Huang

Abstract: Memory Editing (ME) has emerged as an efficient method to modify erroneous facts or inject new facts into Large Language Models (LLMs). Two mainstream ME methods exist: parameter-modifying ME and parameter-preserving ME (integrating extra modules while preserving original parameters). Regrettably, previous studies on ME evaluation have two critical limitations: (i) evaluating LLMs with single edit… ▽ More Memory Editing (ME) has emerged as an efficient method to modify erroneous facts or inject new facts into Large Language Models (LLMs). Two mainstream ME methods exist: parameter-modifying ME and parameter-preserving ME (integrating extra modules while preserving original parameters). Regrettably, previous studies on ME evaluation have two critical limitations: (i) evaluating LLMs with single edit only, neglecting the need for continuous editing, and (ii) evaluations focusing solely on basic factual triples, overlooking broader LLM capabilities like logical reasoning and reading understanding. This study addresses these limitations with contributions threefold: (i) We explore how ME affects a wide range of fundamental capabilities of LLMs under sequential editing. Experimental results reveal an intriguing phenomenon: Most parameter-modifying ME consistently degrade performance across all tasks after a few sequential edits. In contrast, parameter-preserving ME effectively maintains LLMs' fundamental capabilities but struggles to accurately recall edited knowledge presented in a different format. (ii) We extend our evaluation to different editing settings, such as layers to edit, model size, instruction tuning, etc. Experimental findings indicate several strategies that can potentially mitigate the adverse effects of ME. (iii) We further explain why parameter-modifying ME damages LLMs from three dimensions: parameter changes after editing, language modeling capability, and the in-context learning capability. Our in-depth study advocates more careful use of ME in real-world scenarios. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: preprint, 15 pages

arXiv:1105.5675 [pdf]

Scale-Invariant Local Descriptor for Event Recognition in 1D Sensor Signals

Authors: Jierui Xie, Mandis S. Beigi

Abstract: In this paper, we introduce a shape-based, time-scale invariant feature descriptor for 1-D sensor signals. The time-scale invariance of the feature allows us to use feature from one training event to describe events of the same semantic class which may take place over varying time scales such as walking slow and walking fast. Therefore it requires less training set. The descriptor takes advantage… ▽ More In this paper, we introduce a shape-based, time-scale invariant feature descriptor for 1-D sensor signals. The time-scale invariance of the feature allows us to use feature from one training event to describe events of the same semantic class which may take place over varying time scales such as walking slow and walking fast. Therefore it requires less training set. The descriptor takes advantage of the invariant location detection in the scale space theory and employs a high level shape encoding scheme to capture invariant local features of events. Based on this descriptor, a scale-invariant classifier with "R" metric (SIC-R) is designed to recognize multi-scale events of human activities. The R metric combines the number of matches of keypoint in scale space with the Dynamic Time Warping score. SICR is tested on various types of 1-D sensors data from passive infrared, accelerometer and seismic sensors with more than 90% classification accuracy. △ Less

Submitted 27 May, 2011; originally announced May 2011.

Journal ref: IEEE International Conference on Multimedia & Expo(ICME),Page(s):1226 - 1229, 2009

Showing 1–8 of 8 results for author: Beigi, M