-
Tracing and Reversing Rank-One Model Edits
Authors:
Paul Youssef,
Zhixue Zhao,
Christin Seifert,
Jörg Schlötterer
Abstract:
Knowledge editing methods (KEs) are a cost-effective way to update the factual content of large language models (LLMs), but they pose a dual-use risk. While KEs are beneficial for updating outdated or incorrect information, they can be exploited maliciously to implant misinformation or bias. In order to defend against these types of malicious manipulation, we need robust techniques that can reliab…
▽ More
Knowledge editing methods (KEs) are a cost-effective way to update the factual content of large language models (LLMs), but they pose a dual-use risk. While KEs are beneficial for updating outdated or incorrect information, they can be exploited maliciously to implant misinformation or bias. In order to defend against these types of malicious manipulation, we need robust techniques that can reliably detect, interpret, and mitigate adversarial edits. This work investigates the traceability and reversibility of knowledge edits, focusing on the widely used Rank-One Model Editing (ROME) method. We first show that ROME introduces distinctive distributional patterns in the edited weight matrices, which can serve as effective signals for locating the edited weights. Second, we show that these altered weights can reliably be used to predict the edited factual relation, enabling partial reconstruction of the modified fact. Building on this, we propose a method to infer the edited object entity directly from the modified weights, without access to the editing prompt, achieving over 95% accuracy. Finally, we demonstrate that ROME edits can be reversed, recovering the model's original outputs with $\geq$ 80% accuracy. Our findings highlight the feasibility of detecting, tracing, and reversing edits based on the edited weights, offering a robust framework for safeguarding LLMs against adversarial manipulations.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Truth or Twist? Optimal Model Selection for Reliable Label Flipping Evaluation in LLM-based Counterfactuals
Authors:
Qianli Wang,
Van Bach Nguyen,
Nils Feldhus,
Luis Felipe Villa-Arenas,
Christin Seifert,
Sebastian Möller,
Vera Schmitt
Abstract:
Counterfactual examples are widely employed to enhance the performance and robustness of large language models (LLMs) through counterfactual data augmentation (CDA). However, the selection of the judge model used to evaluate label flipping, the primary metric for assessing the validity of generated counterfactuals for CDA, yields inconsistent results. To decipher this, we define four types of rela…
▽ More
Counterfactual examples are widely employed to enhance the performance and robustness of large language models (LLMs) through counterfactual data augmentation (CDA). However, the selection of the judge model used to evaluate label flipping, the primary metric for assessing the validity of generated counterfactuals for CDA, yields inconsistent results. To decipher this, we define four types of relationships between the counterfactual generator and judge models. Through extensive experiments involving two state-of-the-art LLM-based methods, three datasets, five generator models, and 15 judge models, complemented by a user study (n = 90), we demonstrate that judge models with an independent, non-fine-tuned relationship to the generator model provide the most reliable label flipping evaluations. Relationships between the generator and judge models, which are closely aligned with the user study for CDA, result in better model performance and robustness. Nevertheless, we find that the gap between the most effective judge models and the results obtained from the user study remains considerably large. This suggests that a fully automated pipeline for CDA may be inadequate and requires human intervention.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Explanation format does not matter; but explanations do -- An Eggsbert study on explaining Bayesian Optimisation tasks
Authors:
Tanmay Chakraborty,
Marion Koelle,
Jörg Schlötterer,
Nadine Schlicker,
Christian Wirth,
Christin Seifert
Abstract:
Bayesian Optimisation (BO) is a family of methods for finding optimal parameters when the underlying function to be optimised is unknown. BO is used, for example, for hyperparameter tuning in machine learning and as an expert support tool for tuning cyberphysical systems. For settings where humans are involved in the tuning task, methods have been developed to explain BO (Explainable Bayesian Opti…
▽ More
Bayesian Optimisation (BO) is a family of methods for finding optimal parameters when the underlying function to be optimised is unknown. BO is used, for example, for hyperparameter tuning in machine learning and as an expert support tool for tuning cyberphysical systems. For settings where humans are involved in the tuning task, methods have been developed to explain BO (Explainable Bayesian Optimization, XBO). However, there is little guidance on how to present XBO results to humans so that they can tune the system effectively and efficiently. In this paper, we investigate how the XBO explanation format affects users' task performance, task load, understanding and trust in XBO. We chose a task that is accessible to a wide range of users. Specifically, we set up an egg cooking scenario with 6 parameters that participants had to adjust to achieve a perfect soft-boiled egg. We compared three different explanation formats: a bar chart, a list of rules and a textual explanation in a between-subjects online study with 213 participants. Our results show that adding any type of explanation increases task success, reduces the number of trials needed to achieve success, and improves comprehension and confidence. While explanations add more information for participants to process, we found no increase in user task load. We also found that the aforementioned results were independent of the explanation format; all formats had a similar effect. This is an interesting finding for practical applications, as it suggests that explanations can be added to BO tuning tasks without the burden of designing or selecting specific explanation formats. In the future, it would be interesting to investigate scenarios of prolonged use of the explanation formats and whether they have different effects on users' mental models of the underlying system.
△ Less
Submitted 30 April, 2025; v1 submitted 29 April, 2025;
originally announced April 2025.
-
Invariant Learning with Annotation-free Environments
Authors:
Phuong Quynh Le,
Christin Seifert,
Jörg Schlötterer
Abstract:
Invariant learning is a promising approach to improve domain generalization compared to Empirical Risk Minimization (ERM). However, most invariant learning methods rely on the assumption that training examples are pre-partitioned into different known environments. We instead infer environments without the need for additional annotations, motivated by observations of the properties within the repre…
▽ More
Invariant learning is a promising approach to improve domain generalization compared to Empirical Risk Minimization (ERM). However, most invariant learning methods rely on the assumption that training examples are pre-partitioned into different known environments. We instead infer environments without the need for additional annotations, motivated by observations of the properties within the representation space of a trained ERM model. We show the preliminary effectiveness of our approach on the ColoredMNIST benchmark, achieving performance comparable to methods requiring explicit environment labels and on par with an annotation-free method that poses strong restrictions on the ERM reference model.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
An XAI-based Analysis of Shortcut Learning in Neural Networks
Authors:
Phuong Quynh Le,
Jörg Schlötterer,
Christin Seifert
Abstract:
Machine learning models tend to learn spurious features - features that strongly correlate with target labels but are not causal. Existing approaches to mitigate models' dependence on spurious features work in some cases, but fail in others. In this paper, we systematically analyze how and where neural networks encode spurious correlations. We introduce the neuron spurious score, an XAI-based diag…
▽ More
Machine learning models tend to learn spurious features - features that strongly correlate with target labels but are not causal. Existing approaches to mitigate models' dependence on spurious features work in some cases, but fail in others. In this paper, we systematically analyze how and where neural networks encode spurious correlations. We introduce the neuron spurious score, an XAI-based diagnostic measure to quantify a neuron's dependence on spurious features. We analyze both convolutional neural networks (CNNs) and vision transformers (ViTs) using architecture-specific methods. Our results show that spurious features are partially disentangled, but the degree of disentanglement varies across model architectures. Furthermore, we find that the assumptions behind existing mitigation methods are incomplete. Our results lay the groundwork for the development of novel methods to mitigate spurious correlations and make AI models safer to use in practice.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Comparative Explanations: Explanation Guided Decision Making for Human-in-the-Loop Preference Selection
Authors:
Tanmay Chakraborty,
Christian Wirth,
Christin Seifert
Abstract:
This paper introduces Multi-Output LOcal Narrative Explanation (MOLONE), a novel comparative explanation method designed to enhance preference selection in human-in-the-loop Preference Bayesian optimization (PBO). The preference elicitation in PBO is a non-trivial task because it involves navigating implicit trade-offs between vector-valued outcomes, subjective priorities of decision-makers, and d…
▽ More
This paper introduces Multi-Output LOcal Narrative Explanation (MOLONE), a novel comparative explanation method designed to enhance preference selection in human-in-the-loop Preference Bayesian optimization (PBO). The preference elicitation in PBO is a non-trivial task because it involves navigating implicit trade-offs between vector-valued outcomes, subjective priorities of decision-makers, and decision-makers' uncertainty in preference selection. Existing explainable AI (XAI) methods for BO primarily focus on input feature importance, neglecting the crucial role of outputs (objectives) in human preference elicitation. MOLONE addresses this gap by providing explanations that highlight both input and output importance, enabling decision-makers to understand the trade-offs between competing objectives and make more informed preference selections. MOLONE focuses on local explanations, comparing the importance of input features and outcomes across candidate samples within a local neighborhood of the search space, thus capturing nuanced differences relevant to preference-based decision-making. We evaluate MOLONE within a PBO framework using benchmark multi-objective optimization functions, demonstrating its effectiveness in improving convergence compared to noisy preference selections. Furthermore, a user study confirms that MOLONE significantly accelerates convergence in human-in-the-loop scenarios by facilitating more efficient identification of preferred options.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification
Authors:
Van Bach Nguyen,
Christin Seifert,
Jörg Schlötterer
Abstract:
The need for interpretability in deep learning has driven interest in counterfactual explanations, which identify minimal changes to an instance that change a model's prediction. Current counterfactual (CF) generation methods require task-specific fine-tuning and produce low-quality text. Large Language Models (LLMs), though effective for high-quality text generation, struggle with label-flipping…
▽ More
The need for interpretability in deep learning has driven interest in counterfactual explanations, which identify minimal changes to an instance that change a model's prediction. Current counterfactual (CF) generation methods require task-specific fine-tuning and produce low-quality text. Large Language Models (LLMs), though effective for high-quality text generation, struggle with label-flipping counterfactuals (i.e., counterfactuals that change the prediction) without fine-tuning. We introduce two simple classifier-guided approaches to support counterfactual generation by LLMs, eliminating the need for fine-tuning while preserving the strengths of LLMs. Despite their simplicity, our methods outperform state-of-the-art counterfactual generation methods and are effective across different LLMs, highlighting the benefits of guiding counterfactual generation by LLMs with classifier information. We further show that data augmentation by our generated CFs can improve a classifier's robustness. Our analysis reveals a critical issue in counterfactual generation by LLMs: LLMs rely on parametric knowledge rather than faithfully following the classifier.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Behavioral Analysis of Information Salience in Large Language Models
Authors:
Jan Trienes,
Jörg Schlötterer,
Junyi Jessy Li,
Christin Seifert
Abstract:
Large Language Models (LLMs) excel at text summarization, a task that requires models to select content based on its importance. However, the exact notion of salience that LLMs have internalized remains unclear. To bridge this gap, we introduce an explainable framework to systematically derive and investigate information salience in LLMs through their summarization behavior. Using length-controlle…
▽ More
Large Language Models (LLMs) excel at text summarization, a task that requires models to select content based on its importance. However, the exact notion of salience that LLMs have internalized remains unclear. To bridge this gap, we introduce an explainable framework to systematically derive and investigate information salience in LLMs through their summarization behavior. Using length-controlled summarization as a behavioral probe into the content selection process, and tracing the answerability of Questions Under Discussion throughout, we derive a proxy for how models prioritize information. Our experiments on 13 models across four datasets reveal that LLMs have a nuanced, hierarchical notion of salience, generally consistent across model families and sizes. While models show highly consistent behavior and hence salience patterns, this notion of salience cannot be accessed through introspection, and only weakly correlates with human perceptions of information salience.
△ Less
Submitted 27 May, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
This looks like what? Challenges and Future Research Directions for Part-Prototype Models
Authors:
Khawla Elhadri,
Tomasz Michalski,
Adam Wróbel,
Jörg Schlötterer,
Bartosz Zieliński,
Christin Seifert
Abstract:
The growing interest in eXplainable Artificial Intelligence (XAI) has prompted research into models with built-in interpretability, the most prominent of which are part-prototype models. Part-Prototype Models (PPMs) make decisions by comparing an input image to a set of learned prototypes, providing human-understandable explanations in the form of ``this looks like that''. Despite their inherent i…
▽ More
The growing interest in eXplainable Artificial Intelligence (XAI) has prompted research into models with built-in interpretability, the most prominent of which are part-prototype models. Part-Prototype Models (PPMs) make decisions by comparing an input image to a set of learned prototypes, providing human-understandable explanations in the form of ``this looks like that''. Despite their inherent interpretability, PPMS are not yet considered a valuable alternative to post-hoc models. In this survey, we investigate the reasons for this and provide directions for future research. We analyze papers from 2019 to 2024, and derive a taxonomy of the challenges that current PPMS face. Our analysis shows that the open challenges are quite diverse. The main concern is the quality and quantity of prototypes. Other concerns are the lack of generalization to a variety of tasks and contexts, and general methodological issues, including non-standardized evaluation. We provide ideas for future research in five broad directions: improving predictive performance, developing novel architectures grounded in theory, establishing frameworks for human-AI collaboration, aligning models with humans, and establishing metrics and benchmarks for evaluation. We hope that this survey will stimulate research and promote intrinsically interpretable models for application domains. Our list of surveyed papers is available at https://github.com/aix-group/ppm-survey.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Position: Editing Large Language Models Poses Serious Safety Risks
Authors:
Paul Youssef,
Zhixue Zhao,
Daniel Braun,
Jörg Schlötterer,
Christin Seifert
Abstract:
Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are wi…
▽ More
Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.
△ Less
Submitted 10 June, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Authors:
Lukas Kuhn,
Sari Sadiya,
Jorg Schlotterer,
Christin Seifert,
Gemma Roig
Abstract:
Shortcut learning, i.e., a model's reliance on undesired features not directly relevant to the task, is a major challenge that severely limits the applications of machine learning algorithms, particularly when deploying them to assist in making sensitive decisions, such as in medical diagnostics. In this work, we leverage recent advancements in machine learning to create an unsupervised framework…
▽ More
Shortcut learning, i.e., a model's reliance on undesired features not directly relevant to the task, is a major challenge that severely limits the applications of machine learning algorithms, particularly when deploying them to assist in making sensitive decisions, such as in medical diagnostics. In this work, we leverage recent advancements in machine learning to create an unsupervised framework that is capable of both detecting and mitigating shortcut learning in transformers. We validate our method on multiple datasets. Results demonstrate that our framework significantly improves both worst-group accuracy (samples misclassified due to shortcuts) and average accuracy, while minimizing human annotation effort. Moreover, we demonstrate that the detected shortcuts are meaningful and informative to human experts, and that our framework is computationally efficient, allowing it to be run on consumer hardware.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
Duality for Evolutionary Equations with Applications to Control Theory
Authors:
Andreas Buchinger,
Christian Seifert
Abstract:
We study evolutionary equations in exponentially weighted $\mathrm{L}^{2}$-spaces as introduced by Picard in 2009. First, for a given evolutionary equation, we explicitly describe the $ν$-adjoint system, which turns out to describe a system backwards in time. We prove well-posedness for the $ν$-adjoint system. We then apply the thus obtained duality to introduce and study notions of null-controlla…
▽ More
We study evolutionary equations in exponentially weighted $\mathrm{L}^{2}$-spaces as introduced by Picard in 2009. First, for a given evolutionary equation, we explicitly describe the $ν$-adjoint system, which turns out to describe a system backwards in time. We prove well-posedness for the $ν$-adjoint system. We then apply the thus obtained duality to introduce and study notions of null-controllability for evolutionary equations.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Enhancing Fact Retrieval in PLMs through Truthfulness
Authors:
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
Pre-trained Language Models (PLMs) encode various facts about the world at their pre-training phase as they are trained to predict the next or missing word in a sentence. There has a been an interest in quantifying and improving the amount of facts that can be extracted from PLMs, as they have been envisioned to act as soft knowledge bases, which can be queried in natural language. Different appro…
▽ More
Pre-trained Language Models (PLMs) encode various facts about the world at their pre-training phase as they are trained to predict the next or missing word in a sentence. There has a been an interest in quantifying and improving the amount of facts that can be extracted from PLMs, as they have been envisioned to act as soft knowledge bases, which can be queried in natural language. Different approaches exist to enhance fact retrieval from PLM. Recent work shows that the hidden states of PLMs can be leveraged to determine the truthfulness of the PLMs' inputs. Leveraging this finding to improve factual knowledge retrieval remains unexplored. In this work, we investigate the use of a helper model to improve fact retrieval. The helper model assesses the truthfulness of an input based on the corresponding hidden states representations from the PLMs. We evaluate this approach on several masked PLMs and show that it enhances fact retrieval by up to 33\%. Our findings highlight the potential of hidden states representations from PLMs in improving their factual knowledge retrieval.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
How to Make LLMs Forget: On Reversing In-Context Knowledge Edits
Authors:
Paul Youssef,
Zhixue Zhao,
Jörg Schlötterer,
Christin Seifert
Abstract:
In-context knowledge editing (IKE) enables efficient modification of large language model (LLM) outputs without parameter changes and at zero-cost. However, it can be misused to manipulate responses opaquely, e.g., insert misinformation or offensive content. Such malicious interventions could be incorporated into high-level wrapped APIs where the final input prompt is not shown to end-users. To ad…
▽ More
In-context knowledge editing (IKE) enables efficient modification of large language model (LLM) outputs without parameter changes and at zero-cost. However, it can be misused to manipulate responses opaquely, e.g., insert misinformation or offensive content. Such malicious interventions could be incorporated into high-level wrapped APIs where the final input prompt is not shown to end-users. To address this issue, we investigate the detection and reversal of IKE-edits. First, we demonstrate that IKE-edits can be detected with high accuracy (F1 > 80\%) using only the top-10 output probabilities of the next token, even in a black-box setting, e.g. proprietary LLMs with limited output information. Further, we introduce the novel task of reversing IKE-edits using specially tuned reversal tokens. We explore using both continuous and discrete reversal tokens, achieving over 80\% accuracy in recovering original, unedited outputs across multiple LLMs. Our continuous reversal tokens prove particularly effective, with minimal impact on unedited prompts. Through analysis of output distributions, attention patterns, and token rankings, we provide insights into IKE's effects on LLMs and how reversal tokens mitigate them. This work represents a significant step towards enhancing LLM resilience against potential misuse of in-context editing, improving their transparency and trustworthiness.
△ Less
Submitted 10 April, 2025; v1 submitted 16 October, 2024;
originally announced October 2024.
-
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System
Authors:
Gary D. Lopez Munoz,
Amanda J. Minnich,
Roman Lutz,
Richard Lundeen,
Raja Sekhar Rao Dheekonda,
Nina Chikanov,
Bolor-Erdene Jagdagdorj,
Martin Pouliot,
Shiven Chawla,
Whitney Maxwell,
Blake Bullwinkel,
Katherine Pratt,
Joris de Gruyter,
Charlotte Siska,
Pete Bryan,
Tori Westerhoff,
Chang Kawaguchi,
Christian Seifert,
Ram Shankar Siva Kumar,
Yonatan Zunger
Abstract:
Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the need for extensible and model-agnostic risk identification frameworks is growing. To meet this need, we introduce the Python Risk Identification Toolkit…
▽ More
Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the need for extensible and model-agnostic risk identification frameworks is growing. To meet this need, we introduce the Python Risk Identification Toolkit (PyRIT), an open-source framework designed to enhance red teaming efforts in GenAI systems. PyRIT is a model- and platform-agnostic tool that enables red teamers to probe for and identify novel harms, risks, and jailbreaks in multimodal generative AI models. Its composable architecture facilitates the reuse of core building blocks and allows for extensibility to future models and modalities. This paper details the challenges specific to red teaming generative AI systems, the development and features of PyRIT, and its practical applications in real-world scenarios.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Investigating the Impact of Randomness on Reproducibility in Computer Vision: A Study on Applications in Civil Engineering and Medicine
Authors:
Bahadır Eryılmaz,
Osman Alperen Koraş,
Jörg Schlötterer,
Christin Seifert
Abstract:
Reproducibility is essential for scientific research. However, in computer vision, achieving consistent results is challenging due to various factors. One influential, yet often unrecognized, factor is CUDA-induced randomness. Despite CUDA's advantages for accelerating algorithm execution on GPUs, if not controlled, its behavior across multiple executions remains non-deterministic. While reproduci…
▽ More
Reproducibility is essential for scientific research. However, in computer vision, achieving consistent results is challenging due to various factors. One influential, yet often unrecognized, factor is CUDA-induced randomness. Despite CUDA's advantages for accelerating algorithm execution on GPUs, if not controlled, its behavior across multiple executions remains non-deterministic. While reproducibility issues in ML being researched, the implications of CUDA-induced randomness in application are yet to be understood. Our investigation focuses on this randomness across one standard benchmark dataset and two real-world datasets in an isolated environment. Our results show that CUDA-induced randomness can account for differences up to 4.77% in performance scores. We find that managing this variability for reproducibility may entail increased runtime or reduce performance, but that disadvantages are not as significant as reported in previous studies.
△ Less
Submitted 19 September, 2024;
originally announced October 2024.
-
Out of spuriousity: Improving robustness to spurious correlations without group annotations
Authors:
Phuong Quynh Le,
Jörg Schlötterer,
Christin Seifert
Abstract:
Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a sub…
▽ More
Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations. The subnetwork is found by the assumption that data points with the same spurious attribute will be close to each other in the representation space when training with ERM, then we employ supervised contrastive loss in a novel way to force models to unlearn the spurious connections. The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network that is responsible for using only invariant features in classification tasks, therefore erasing the influence of spurious features even in the setup of multi spurious attributes and no prior knowledge of attributes labels.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
Patch-based Intuitive Multimodal Prototypes Network (PIMPNet) for Alzheimer's Disease classification
Authors:
Lisa Anita De Santi,
Jörg Schlötterer,
Meike Nauta,
Vincenzo Positano,
Christin Seifert
Abstract:
Volumetric neuroimaging examinations like structural Magnetic Resonance Imaging (sMRI) are routinely applied to support the clinical diagnosis of dementia like Alzheimer's Disease (AD). Neuroradiologists examine 3D sMRI to detect and monitor abnormalities in brain morphology due to AD, like global and/or local brain atrophy and shape alteration of characteristic structures. There is a strong resea…
▽ More
Volumetric neuroimaging examinations like structural Magnetic Resonance Imaging (sMRI) are routinely applied to support the clinical diagnosis of dementia like Alzheimer's Disease (AD). Neuroradiologists examine 3D sMRI to detect and monitor abnormalities in brain morphology due to AD, like global and/or local brain atrophy and shape alteration of characteristic structures. There is a strong research interest in developing diagnostic systems based on Deep Learning (DL) models to analyse sMRI for AD. However, anatomical information extracted from an sMRI examination needs to be interpreted together with patient's age to distinguish AD patterns from the regular alteration due to a normal ageing process. In this context, part-prototype neural networks integrate the computational advantages of DL in an interpretable-by-design architecture and showed promising results in medical imaging applications. We present PIMPNet, the first interpretable multimodal model for 3D images and demographics applied to the binary classification of AD from 3D sMRI and patient's age. Despite age prototypes do not improve predictive performance compared to the single modality model, this lays the foundation for future work in the direction of the model's design and multimodal prototype training process
△ Less
Submitted 22 July, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle
Authors:
Emman Haider,
Daniel Perez-Becker,
Thomas Portet,
Piyush Madan,
Amit Garg,
Atabak Ashfaq,
David Majercak,
Wen Wen,
Dongwoo Kim,
Ziyi Yang,
Jianwen Zhang,
Hiteshi Sharma,
Blake Bullwinkel,
Martin Pouliot,
Amanda Minnich,
Shiven Chawla,
Solianna Herrera,
Shahed Warreth,
Maggie Engler,
Gary Lopez,
Nina Chikanov,
Raja Sekhar Rao Dheekonda,
Bolor-Erdene Jagdagdorj,
Roman Lutz,
Richard Lundeen
, et al. (6 additional authors not shown)
Abstract:
Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3…
▽ More
Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3 series of language models. We utilized a "break-fix" cycle, performing multiple rounds of dataset curation, safety post-training, benchmarking, red teaming, and vulnerability identification to cover a variety of harm areas in both single and multi-turn scenarios. Our results indicate that this approach iteratively improved the performance of the Phi-3 models across a wide range of responsible AI benchmarks. Finally, we include additional red teaming strategies and evaluations that were used to test the safety behavior of Phi-3.5-mini and Phi-3.5-MoE, which were optimized for multilingual capabilities.
△ Less
Submitted 22 August, 2024; v1 submitted 18 July, 2024;
originally announced July 2024.
-
Has this Fact been Edited? Detecting Knowledge Edits in Language Models
Authors:
Paul Youssef,
Zhixue Zhao,
Christin Seifert,
Jörg Schlötterer
Abstract:
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transpa…
▽ More
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transparency. Driven by this, we propose a novel task: detecting edited knowledge in language models. Given an edited model and a fact retrieved by a prompt from an edited model, the objective is to classify the knowledge as either unedited (based on the pre-training), or edited (based on subsequent editing). We instantiate the task with four KEs, two LLMs, and two datasets. Additionally, we propose using the hidden state representations and the probability distributions as features for the detection. Our results reveal that, using these features as inputs to a simple AdaBoost classifiers establishes a strong baseline. This classifier requires only a limited amount of data and maintains its performance even in cross-domain settings. Last, we find it more challenging to distinguish edited knowledge from unedited but related knowledge, highlighting the need for further research. Our work lays the groundwork for addressing malicious model editing, which is a critical challenge associated with the strong generative capabilities of LLMs.
△ Less
Submitted 10 February, 2025; v1 submitted 4 May, 2024;
originally announced May 2024.
-
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Authors:
Van Bach Nguyen,
Paul Youssef,
Christin Seifert,
Jörg Schlötterer
Abstract:
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how…
▽ More
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions.
△ Less
Submitted 12 November, 2024; v1 submitted 26 April, 2024;
originally announced May 2024.
-
Feature importance to explain multimodal prediction models. A clinical use case
Authors:
Jorn-Jan van de Beld,
Shreyasi Pathak,
Jeroen Geerdink,
Johannes H. Hegeman,
Christin Seifert
Abstract:
Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative…
▽ More
Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative and per-operative data from elderly hip fracture patients. Specifically, we include static patient data, hip and chest images before surgery in pre-operative data, vital signals, and medications administered during surgery in per-operative data. We extract features from image modalities using ResNet and from vital signals using LSTM. Explainable model outcomes are essential for clinical applicability, therefore we compute Shapley values to explain the predictions of our multimodal black box model. We find that i) Shapley values can be used to estimate the relative contribution of each modality both locally and globally, and ii) a modified version of the chain rule can be used to propagate Shapley values through a sequence of models supporting interpretable local explanations. Our findings imply that a multimodal combination of black box models can be explained by propagating Shapley values through the model sequence.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
CEval: A Benchmark for Evaluating Counterfactual Text Generation
Authors:
Van Bach Nguyen,
Jörg Schlötterer,
Christin Seifert
Abstract:
Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, in…
▽ More
Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, includes common counterfactual datasets with human annotations, standard baselines (MICE, GDBA, CREST) and the open-source language model LLAMA-2. Our experiments found no perfect method for generating counterfactual text. Methods that excel at counterfactual metrics often produce lower-quality text while LLMs with simple prompts generate high-quality text but struggle with counterfactual criteria. By making CEval available as an open-source Python library, we encourage the community to contribute more methods and maintain consistent evaluation in future work.
△ Less
Submitted 13 August, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Approximation of Random Evolution Equations of Parabolic type
Authors:
Katharina Klioba,
Christian Seifert
Abstract:
In this paper, we present an abstract framework to obtain convergence rates for the approximation of random evolution equations corresponding to a random family of forms determined by finite-dimensional noise. The full discretization error in space, time, and randomness is considered, where polynomial chaos expansion (PCE) is used for the semi-discretization in randomness. The main result are regu…
▽ More
In this paper, we present an abstract framework to obtain convergence rates for the approximation of random evolution equations corresponding to a random family of forms determined by finite-dimensional noise. The full discretization error in space, time, and randomness is considered, where polynomial chaos expansion (PCE) is used for the semi-discretization in randomness. The main result are regularity conditions on the random forms under which convergence of polynomial order in randomness is obtained depending on the smoothness of the coefficients and the Sobolev regularity of the initial value. In space and time, the same convergence rates as in the deterministic setting are achieved. To this end, we derive error estimates for vector-valued PCE as well as a quantified version of the Trotter--Kato theorem for form-induced semigroups. We apply the abstract framework to an anisotropic diffusion model with random diffusion coefficients.
△ Less
Submitted 18 December, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Authors:
Ahmad Idrissi-Yaghir,
Amin Dada,
Henning Schäfer,
Kamyar Arzideh,
Giulia Baldini,
Jan Trienes,
Max Hasin,
Jeanette Bewersdorff,
Cynthia S. Schmidt,
Marie Bauer,
Kaleb E. Smith,
Jiang Bian,
Yonghui Wu,
Jörg Schlötterer,
Torsten Zesch,
Peter A. Horn,
Christin Seifert,
Felix Nensa,
Jens Kleesiek,
Christoph M. Friedrich
Abstract:
Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are commo…
▽ More
Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primarily through continuous pre-training on domain-specific data. We pre-trained several German medical language models on 2.4B tokens derived from translated public English medical data and 3B tokens of German clinical data. The resulting models were evaluated on various German downstream tasks, including named entity recognition (NER), multi-label classification, and extractive question answering. Our results suggest that models augmented by clinical and translation-based pre-training typically outperform general domain models in medical contexts. We conclude that continuous pre-training has demonstrated the ability to match or even exceed the performance of clinical models trained from scratch. Furthermore, pre-training on clinical data or leveraging translated texts have proven to be reliable methods for domain adaptation in medical NLP tasks.
△ Less
Submitted 8 May, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Prototype-based Interpretable Breast Cancer Prediction Models: Analysis and Challenges
Authors:
Shreyasi Pathak,
Jörg Schlötterer,
Jeroen Veltman,
Jeroen Geerdink,
Maurice van Keulen,
Christin Seifert
Abstract:
Deep learning models have achieved high performance in medical applications, however, their adoption in clinical practice is hindered due to their black-box nature. Self-explainable models, like prototype-based models, can be especially beneficial as they are interpretable by design. However, if the learnt prototypes are of low quality then the prototype-based models are as good as black-box. Havi…
▽ More
Deep learning models have achieved high performance in medical applications, however, their adoption in clinical practice is hindered due to their black-box nature. Self-explainable models, like prototype-based models, can be especially beneficial as they are interpretable by design. However, if the learnt prototypes are of low quality then the prototype-based models are as good as black-box. Having high quality prototypes is a pre-requisite for a truly interpretable model. In this work, we propose a prototype evaluation framework for coherence (PEF-C) for quantitatively evaluating the quality of the prototypes based on domain knowledge. We show the use of PEF-C in the context of breast cancer prediction using mammography. Existing works on prototype-based models on breast cancer prediction using mammography have focused on improving the classification performance of prototype-based models compared to black-box models and have evaluated prototype quality through anecdotal evidence. We are the first to go beyond anecdotal evidence and evaluate the quality of the mammography prototypes systematically using our PEF-C. Specifically, we apply three state-of-the-art prototype-based models, ProtoPNet, BRAIxProtoPNet++ and PIP-Net on mammography images for breast cancer prediction and evaluate these models w.r.t. i) classification performance, and ii) quality of the prototypes, on three public datasets. Our results show that prototype-based models are competitive with black-box models in terms of classification performance, and achieve a higher score in detecting ROIs. However, the quality of the prototypes are not yet sufficient and can be improved in aspects of relevance, purity and learning a variety of prototypes. We call the XAI community to systematically evaluate the quality of the prototypes to check their true usability in high stake decisions and improve such models further.
△ Less
Submitted 19 July, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
PIPNet3D: Interpretable Detection of Alzheimer in MRI Scans
Authors:
Lisa Anita De Santi,
Jörg Schlötterer,
Michael Scheschenja,
Joel Wessendorf,
Meike Nauta,
Vincenzo Positano,
Christin Seifert
Abstract:
Information from neuroimaging examinations is increasingly used to support diagnoses of dementia, e.g., Alzheimer's disease. While current clinical practice is mainly based on visual inspection and feature engineering, Deep Learning approaches can be used to automate the analysis and to discover new image-biomarkers. Part-prototype neural networks (PP-NN) are an alternative to standard blackbox mo…
▽ More
Information from neuroimaging examinations is increasingly used to support diagnoses of dementia, e.g., Alzheimer's disease. While current clinical practice is mainly based on visual inspection and feature engineering, Deep Learning approaches can be used to automate the analysis and to discover new image-biomarkers. Part-prototype neural networks (PP-NN) are an alternative to standard blackbox models, and have shown promising results in general computer vision. PP-NN's base their reasoning on prototypical image regions that are learned fully unsupervised, and combined with a simple-to-understand decision layer. We present PIPNet3D, a PP-NN for volumetric images. We apply PIPNet3D to the clinical diagnosis of Alzheimer's Disease from structural Magnetic Resonance Imaging (sMRI). We assess the quality of prototypes under a systematic evaluation framework, propose new functionally grounded metrics to evaluate brain prototypes and develop an evaluation scheme to assess their coherency with domain experts. Our results show that PIPNet3D is an interpretable, compact model for Alzheimer's diagnosis with its reasoning well aligned to medical domain knowledge. Notably, PIPNet3D achieves the same accuracy as its blackbox counterpart; and removing the remaining clinically irrelevant prototypes from its decision process does not decrease predictive performance.
△ Less
Submitted 22 July, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
A Second Look on BASS -- Boosting Abstractive Summarization with Unified Semantic Graphs -- A Replication Study
Authors:
Osman Alperen Koraş,
Jörg Schlötterer,
Christin Seifert
Abstract:
We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs. Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components. Our findings reveal discrepancies in performance compared to the original work. We…
▽ More
We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs. Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components. Our findings reveal discrepancies in performance compared to the original work. We highlight the significance of paying careful attention even to reasonably omitted details for replicating advanced frameworks like BASS, and emphasize key practices for writing replicable papers.
△ Less
Submitted 25 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
On solitary waves for the Korteweg--de Vries equation on metric star graphs
Authors:
Delio Mugnolo,
Diego Noja,
Christian Seifert
Abstract:
We study the Korteweg--de Vries equation on a metric star graph and investigate existence of solitary waves on the metric graph in terms of the coefficients of the equation on each edge, the coupling condition at the central vertex of the star and the speeds of the travelling wave. We show that, with a continuity condition at the vertex, solitary waves can occur exactly when the parameters are cho…
▽ More
We study the Korteweg--de Vries equation on a metric star graph and investigate existence of solitary waves on the metric graph in terms of the coefficients of the equation on each edge, the coupling condition at the central vertex of the star and the speeds of the travelling wave. We show that, with a continuity condition at the vertex, solitary waves can occur exactly when the parameters are chosen in a fairly special manner. We also consider coupling conditions beyond continuity.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
The Queen of England is not England's Queen: On the Lack of Factual Coherency in PLMs
Authors:
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we co…
▽ More
Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we consider a complementary aspect, namely the coherency of factual knowledge in PLMs, i.e., how often can PLMs predict the subject entity given its initial prediction of the object entity. This goes beyond evaluating how much PLMs know, and focuses on the internal state of knowledge inside them. Our results indicate that PLMs have low coherency using manually written, optimized and paraphrased prompts, but including an evidence paragraph leads to substantial improvement. This shows that PLMs fail to model inverse relations and need further enhancements to be able to handle retrieving facts from their parameters in a coherent manner, and to be considered as knowledge bases.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Authors:
Jan Trienes,
Sebastian Joseph,
Jörg Schlötterer,
Christin Seifert,
Kyle Lo,
Wei Xu,
Byron C. Wallace,
Junyi Jessy Li
Abstract:
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their…
▽ More
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.
△ Less
Submitted 4 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Explainable Bayesian Optimization
Authors:
Tanmay Chakraborty,
Christian Wirth,
Christin Seifert
Abstract:
Manual parameter tuning of cyber-physical systems is a common practice, but it is labor-intensive. Bayesian Optimization (BO) offers an automated alternative, yet its black-box nature reduces trust and limits human-BO collaborative system tuning. Experts struggle to interpret BO recommendations due to the lack of explanations. This paper addresses the post-hoc BO explainability problem for cyber-p…
▽ More
Manual parameter tuning of cyber-physical systems is a common practice, but it is labor-intensive. Bayesian Optimization (BO) offers an automated alternative, yet its black-box nature reduces trust and limits human-BO collaborative system tuning. Experts struggle to interpret BO recommendations due to the lack of explanations. This paper addresses the post-hoc BO explainability problem for cyber-physical systems. We introduce TNTRules (Tune-No-Tune Rules), a novel algorithm that provides both global and local explanations for BO recommendations. TNTRules generates actionable rules and visual graphs, identifying optimal solution bounds and ranges, as well as potential alternative solutions. Unlike existing explainable AI (XAI) methods, TNTRules is tailored specifically for BO, by encoding uncertainty via a variance pruning technique and hierarchical agglomerative clustering. A multi-objective optimization approach allows maximizing explanation quality. We evaluate TNTRules using established XAI metrics (Correctness, Completeness, and Compactness) and compare it against adapted baseline methods. The results demonstrate that TNTRules generates high-fidelity, compact, and complete explanations, significantly outperforming three baselines on 5 multi-objective testing functions and 2 hyperparameter tuning problems.
△ Less
Submitted 1 April, 2025; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Trust your BMS: Designing a Lightweight Authentication Architecture for Industrial Networks
Authors:
Fikret Basic,
Christian Steger,
Christian Seifert,
Robert Kofler
Abstract:
With the advent of clean energy awareness and systems that rely on extensive battery usage, the community has seen an increased interest in the development of more complex and secure Battery Management Systems (BMS). In particular, the inclusion of BMS in modern complex systems like electric vehicles and power grids has presented a new set of security-related challenges. A concern is shown when BM…
▽ More
With the advent of clean energy awareness and systems that rely on extensive battery usage, the community has seen an increased interest in the development of more complex and secure Battery Management Systems (BMS). In particular, the inclusion of BMS in modern complex systems like electric vehicles and power grids has presented a new set of security-related challenges. A concern is shown when BMS are intended to extend their communication with external system networks, as their interaction can leave many backdoors open that potential attackers could exploit. Hence, it is highly desirable to find a general design that can be used for BMS and its system inclusion. In this work, a security architecture solution is proposed intended for the communication between BMS and other system devices. The aim of the proposed architecture is to be easily applicable in different industrial settings and systems, while at the same time keeping the design lightweight in nature.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Feature Attribution Explanations for Spiking Neural Networks
Authors:
Elisa Nguyen,
Meike Nauta,
Gwenn Englebienne,
Christin Seifert
Abstract:
Third-generation artificial neural networks, Spiking Neural Networks (SNNs), can be efficiently implemented on hardware. Their implementation on neuromorphic chips opens a broad range of applications, such as machine learning-based autonomous control and intelligent biomedical devices. In critical applications, however, insight into the reasoning of SNNs is important, thus SNNs need to be equipped…
▽ More
Third-generation artificial neural networks, Spiking Neural Networks (SNNs), can be efficiently implemented on hardware. Their implementation on neuromorphic chips opens a broad range of applications, such as machine learning-based autonomous control and intelligent biomedical devices. In critical applications, however, insight into the reasoning of SNNs is important, thus SNNs need to be equipped with the ability to explain how decisions are reached. We present \textit{Temporal Spike Attribution} (TSA), a local explanation method for SNNs. To compute the explanation, we aggregate all information available in model-internal variables: spike times and model weights. We evaluate TSA on artificial and real-world time series data and measure explanation quality w.r.t. multiple quantitative criteria. We find that TSA correctly identifies a small subset of input features relevant to the decision (i.e., is output-complete and compact) and generates similar explanations for similar inputs (i.e., is continuous). Further, our experiments show that incorporating the notion of \emph{absent} spikes improves explanation quality. Our work can serve as a starting point for explainable SNNs, with future implementations on hardware yielding not only predictions but also explanations in a broad range of application scenarios. Source code is available at https://github.com/ElisaNguyen/tsa-explanations.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models
Authors:
Paul Youssef,
Osman Alperen Koraş,
Meijie Li,
Jörg Schlötterer,
Christin Seifert
Abstract:
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for…
▽ More
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for factual knowledge. Our contributions are: (1) We propose a categorization scheme for factual probing methods that is based on how their inputs, outputs and the probed PLMs are adapted; (2) We provide an overview of the datasets used for factual probing; (3) We synthesize insights about knowledge retention and prompt optimization in PLMs, analyze obstacles to adopting PLMs as knowledge bases and outline directions for future work.
△ Less
Submitted 4 December, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Case-level Breast Cancer Prediction for Real Hospital Settings
Authors:
Shreyasi Pathak,
Jörg Schlötterer,
Jeroen Geerdink,
Jeroen Veltman,
Maurice van Keulen,
Nicola Strisciuglio,
Christin Seifert
Abstract:
Breast cancer prediction models for mammography assume that annotations are available for individual images or regions of interest (ROIs), and that there is a fixed number of images per patient. These assumptions do not hold in real hospital settings, where clinicians provide only a final diagnosis for the entire mammography exam (case). Since data in real hospital settings scales with continuous…
▽ More
Breast cancer prediction models for mammography assume that annotations are available for individual images or regions of interest (ROIs), and that there is a fixed number of images per patient. These assumptions do not hold in real hospital settings, where clinicians provide only a final diagnosis for the entire mammography exam (case). Since data in real hospital settings scales with continuous patient intake, while manual annotation efforts do not, we develop a framework for case-level breast cancer prediction that does not require any manual annotation and can be trained with case labels readily available at the hospital. Specifically, we propose a two-level multi-instance learning (MIL) approach at patch and image level for case-level breast cancer prediction and evaluate it on two public and one private dataset. We propose a novel domain-specific MIL pooling observing that breast cancer may or may not occur in both sides, while images of both breasts are taken as a precaution during mammography. We propose a dynamic training procedure for training our MIL framework on a variable number of images per case. We show that our two-level MIL model can be applied in real hospital settings where only case labels, and a variable number of images per case are available, without any loss in performance compared to models trained on image labels. Only trained with weak (case-level) labels, it has the capability to point out in which breast side, mammography view and view region the abnormality lies.
△ Less
Submitted 19 October, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Bridging the Gulf of Envisioning: Cognitive Design Challenges in LLM Interfaces
Authors:
Hariharan Subramonyam,
Roy Pea,
Christopher Lawrence Pondoc,
Maneesh Agrawala,
Colleen Seifert
Abstract:
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interact…
▽ More
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users 'envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of Envisioning by highlighting three misalignments: (1) knowing whether LLMs can accomplish the task, (2) how to instruct the LLM to do the task, and (3) how to evaluate the success of the LLM's output in meeting the goal. Finally, we make recommendations to narrow the envisioning gulf in human-LLM interactions.
△ Less
Submitted 18 March, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Know What Not To Know: Users' Perception of Abstaining Classifiers
Authors:
Andrea Papenmeier,
Daniel Hienert,
Yvonne Kammerer,
Christin Seifert,
Dagmar Kern
Abstract:
Machine learning systems can help humans to make decisions by providing decision suggestions (i.e., a label for a datapoint). However, individual datapoints do not always provide enough clear evidence to make confident suggestions. Although methods exist that enable systems to identify those datapoints and subsequently abstain from suggesting a label, it remains unclear how users would react to su…
▽ More
Machine learning systems can help humans to make decisions by providing decision suggestions (i.e., a label for a datapoint). However, individual datapoints do not always provide enough clear evidence to make confident suggestions. Although methods exist that enable systems to identify those datapoints and subsequently abstain from suggesting a label, it remains unclear how users would react to such system behavior. This paper presents first findings from a user study on systems that do or do not abstain from labeling ambiguous datapoints. Our results show that label suggestions on ambiguous datapoints bear a high risk of unconsciously influencing the users' decisions, even toward incorrect ones. Furthermore, participants perceived a system that abstains from labeling uncertain datapoints as equally competent and trustworthy as a system that delivers label suggestions for all datapoints. Consequently, if abstaining does not impair a system's credibility, it can be a useful mechanism to increase decision quality.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?
Authors:
Phuong Quynh Le,
Jörg Schlötterer,
Christin Seifert
Abstract:
Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite cla…
▽ More
Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite class but with the spurious feature present. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. Based on the main argument that ERM mods can learn core features sufficiently well, DFR only needs to retrain the last layer of the classification model with a small group-balanced data set. In this work, we examine the applicability of DFR to realistic data in the medical domain. Furthermore, we investigate the reasoning behind the effectiveness of last-layer retraining and show that even though DFR has the potential to improve the accuracy of the worst group, it remains susceptible to spurious correlations.
△ Less
Submitted 9 January, 2024; v1 submitted 1 August, 2023;
originally announced August 2023.
-
The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers
Authors:
Meike Nauta,
Christin Seifert
Abstract:
Interpretable part-prototype models are computer vision models that are explainable by design. The models learn prototypical parts and recognise these components in an image, thereby combining classification and explanation. Despite the recent attention for intrinsically interpretable models, there is no comprehensive overview on evaluating the explanation quality of interpretable part-prototype m…
▽ More
Interpretable part-prototype models are computer vision models that are explainable by design. The models learn prototypical parts and recognise these components in an image, thereby combining classification and explanation. Despite the recent attention for intrinsically interpretable models, there is no comprehensive overview on evaluating the explanation quality of interpretable part-prototype models. Based on the Co-12 properties for explanation quality as introduced in arXiv:2201.08164 (e.g., correctness, completeness, compactness), we review existing work that evaluates part-prototype models, reveal research gaps and outline future approaches for evaluation of the explanation quality of part-prototype models. This paper, therefore, contributes to the progression and maturity of this relatively new research field on interpretable part-prototype models. We additionally provide a ``Co-12 cheat sheet'' that acts as a concise summary of our findings on evaluating part-prototype models.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis
Authors:
Jan Trienes,
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting…
▽ More
Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting their transferability to domains and languages where those resources are unavailable. Second, while automatic metrics like ROUGE show progress, we lack a good understanding of the errors and failure modes in this task. To bridge these gaps, we first propose a domain-agnostic guidance signal in form of variable-length extractive summaries. Our empirical results on two English benchmarks demonstrate that this guidance signal improves upon unguided summarization while being competitive with domain-specific methods. Additionally, we run an expert evaluation of four systems according to a taxonomy of 11 fine-grained errors. We find that the most pressing differences between automatic summaries and those of radiologists relate to content selection including omissions (up to 52%) and additions (up to 57%). We hypothesize that latent reporting factors and corpus-level inconsistencies may limit models to reliably learn content selection from the available data, presenting promising directions for future work.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Interpreting and Correcting Medical Image Classification with PIP-Net
Authors:
Meike Nauta,
Johannes H. Hegeman,
Jeroen Geerdink,
Jörg Schlötterer,
Maurice van Keulen,
Christin Seifert
Abstract:
Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability…
▽ More
Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.
△ Less
Submitted 11 September, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Spectral Theory for Schrödinger operators on compact metric graphs with $δ$ and $δ'$ couplings: a survey
Authors:
Jonathan Rohleder,
Christian Seifert
Abstract:
Spectral properties of Schrödinger operators on compact metric graphs are studied and special emphasis is put on differences in the spectral behavior between different classes of vertex conditions. We survey recent results especially for $δ$ and $δ'$ couplings and demonstrate the spectral properties on many examples. Amongst other things, properties of the ground state eigenvalue and eigenfunction…
▽ More
Spectral properties of Schrödinger operators on compact metric graphs are studied and special emphasis is put on differences in the spectral behavior between different classes of vertex conditions. We survey recent results especially for $δ$ and $δ'$ couplings and demonstrate the spectral properties on many examples. Amongst other things, properties of the ground state eigenvalue and eigenfunction and the spectral behavior under various perturbations of the metric graph or the vertex conditions are considered.
△ Less
Submitted 3 July, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Perturbations of non-autonomous second-order abstract Cauchy problems
Authors:
Christian Budde,
Christian Seifert
Abstract:
In this paper we present time-dependent perturbations of second-order non-autonomous abstract Cauchy problems associated to a family of operators with constant domain. We make use of the equivalence to a first-order non-autonomous abstract Cauchy problem in a product space, which we elaborate in full detail. As an application we provide a perturbed non-autonomous wave equation.
In this paper we present time-dependent perturbations of second-order non-autonomous abstract Cauchy problems associated to a family of operators with constant domain. We make use of the equivalence to a first-order non-autonomous abstract Cauchy problem in a product space, which we elaborate in full detail. As an application we provide a perturbed non-autonomous wave equation.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
How Accurate Does It Feel? -- Human Perception of Different Types of Classification Mistakes
Authors:
Andrea Papenmeier,
Dagmar Kern,
Daniel Hienert,
Yvonne Kammerer,
Christin Seifert
Abstract:
Supervised machine learning utilizes large datasets, often with ground truth labels annotated by humans. While some data points are easy to classify, others are hard to classify, which reduces the inter-annotator agreement. This causes noise for the classifier and might affect the user's perception of the classifier's performance. In our research, we investigated whether the classification difficu…
▽ More
Supervised machine learning utilizes large datasets, often with ground truth labels annotated by humans. While some data points are easy to classify, others are hard to classify, which reduces the inter-annotator agreement. This causes noise for the classifier and might affect the user's perception of the classifier's performance. In our research, we investigated whether the classification difficulty of a data point influences how strongly a prediction mistake reduces the "perceived accuracy". In an experimental online study, 225 participants interacted with three fictive classifiers with equal accuracy (73%). The classifiers made prediction mistakes on three different types of data points (easy, difficult, impossible). After the interaction, participants judged the classifier's accuracy. We found that not all prediction mistakes reduced the perceived accuracy equally. Furthermore, the perceived accuracy differed significantly from the calculated accuracy. To conclude, accuracy and related measures seem unsuitable to represent how users perceive the performance of classifiers.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent
Authors:
Van Bach Nguyen,
Jörg Schlötterer,
Christin Seifert
Abstract:
The goal of Explainable AI (XAI) is to design methods to provide insights into the reasoning process of black-box models, such as deep neural networks, in order to explain them to humans. Social science research states that such explanations should be conversational, similar to human-to-human explanations. In this work, we show how to incorporate XAI in a conversational agent, using a standard des…
▽ More
The goal of Explainable AI (XAI) is to design methods to provide insights into the reasoning process of black-box models, such as deep neural networks, in order to explain them to humans. Social science research states that such explanations should be conversational, similar to human-to-human explanations. In this work, we show how to incorporate XAI in a conversational agent, using a standard design for the agent comprising natural language understanding and generation components. We build upon an XAI question bank, which we extend by quality-controlled paraphrases, to understand the user's information needs. We further systematically survey the literature for suitable explanation methods that provide the information to answer those questions, and present a comprehensive list of suggestions. Our work is the first step towards truly natural conversations about machine learning models with an explanation agent. The comprehensive list of XAI questions and the corresponding explanation methods may support other researchers in providing the necessary information to address users' demands. To facilitate future work, we release our source code and data.
△ Less
Submitted 22 July, 2024; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Human-AI Guidelines in Practice: Leaky Abstractions as an Enabler in Collaborative Software Teams
Authors:
Hariharan Subramonyam,
Jane Im,
Colleen Seifert,
Eytan Adar
Abstract:
In conventional software development, user experience (UX) designers and engineers collaborate through separation of concerns (SoC): designers create human interface specifications, and engineers build to those specifications. However, we argue that Human-AI systems thwart SoC because human needs must shape the design of the AI interface, the underlying AI sub-components, and training data. How do…
▽ More
In conventional software development, user experience (UX) designers and engineers collaborate through separation of concerns (SoC): designers create human interface specifications, and engineers build to those specifications. However, we argue that Human-AI systems thwart SoC because human needs must shape the design of the AI interface, the underlying AI sub-components, and training data. How do designers and engineers currently collaborate on AI and UX design? To find out, we interviewed 21 industry professionals (UX researchers, AI engineers, data scientists, and managers) across 14 organizations about their collaborative work practices and associated challenges. We find that hidden information encapsulated by SoC challenges collaboration across design and engineering concerns. Practitioners describe inventing ad-hoc representations exposing low-level design and implementation details (which we characterize as leaky abstractions) to "puncture" SoC and share information across expertise boundaries. We identify how leaky abstractions are employed to collaborate at the AI-UX boundary and formalize a process of creating and using leaky abstractions.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
A note on the Lumer--Phillips theorem for bi-continuous semigroups
Authors:
Karsten Kruse,
Christian Seifert
Abstract:
Given a Banach space $X$ and an additional coarser Hausdorff locally convex topology $τ$ on $X$ we characterise the generators of $τ$-bi-continuous semigroups in the spirit of the Lumer--Phillips theorem, i.e. by means of dissipativity w.r.t.~a directed system of seminorms and a range condition.
Given a Banach space $X$ and an additional coarser Hausdorff locally convex topology $τ$ on $X$ we characterise the generators of $τ$-bi-continuous semigroups in the spirit of the Lumer--Phillips theorem, i.e. by means of dissipativity w.r.t.~a directed system of seminorms and a range condition.
△ Less
Submitted 8 November, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Final state observability estimates and cost-uniform approximate null-controllability for bi-continuous semigroups
Authors:
Karsten Kruse,
Christian Seifert
Abstract:
We consider final state observability estimates for bi-continuous semigroups on Banach spaces, i.e. for every initial value, estimating the state at a final time $T>0$ by taking into account the orbit of the initial value under the semigroup for $t\in [0,T]$, measured in a suitable norm. We state a sufficient criterion based on an uncertainty relation and a dissipation estimate and provide two exa…
▽ More
We consider final state observability estimates for bi-continuous semigroups on Banach spaces, i.e. for every initial value, estimating the state at a final time $T>0$ by taking into account the orbit of the initial value under the semigroup for $t\in [0,T]$, measured in a suitable norm. We state a sufficient criterion based on an uncertainty relation and a dissipation estimate and provide two examples of bi-continuous semigroups which share a final state observability estimate, namely the Gauss-Weierstrass semigroup and the Ornstein-Uhlenbeck semigroup on the space of bounded continuous functions. Moreover, we generalise the duality between cost-uniform approximate null-controllability and final state observability estimates to the setting of locally convex spaces for the case of bounded and continuous control functions, which seems to be new even for the Banach spaces case.
△ Less
Submitted 31 March, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers
Authors:
Stefan Haller,
Adina Aldea,
Christin Seifert,
Nicola Strisciuglio
Abstract:
Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published m…
▽ More
Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published methods that deploy deep learning approaches. In particular, we focus our analysis on the transition from hand engineered features to representation learning approaches, which learn representative features for the task at hand automatically from large corpora of data. We structure our analysis of deep learning methods along three categories: word embeddings, sequential models, and attention-based methods. Deep learning impacted ASAG differently than other fields of NLP, as we noticed that the learned representations alone do not contribute to achieve the best results, but they rather show to work in a complementary way with hand-engineered features. The best performance are indeed achieved by methods that combine the carefully hand-engineered features with the power of the semantic descriptions provided by the latest models, like transformers architectures. We identify challenges and provide an outlook on research direction that can be addressed in the future
△ Less
Submitted 11 March, 2022;
originally announced April 2022.