Skip to main content

Showing 1–9 of 9 results for author: Wäldchen, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.19382  [pdf, ps, other

    cs.CL

    Measuring and Guiding Monosemanticity

    Authors: Ruben Härle, Felix Friedrich, Manuel Brack, Stephan Wäldchen, Björn Deiseroth, Patrick Schramowski, Kristian Kersting

    Abstract: There is growing interest in leveraging mechanistic interpretability and controllability to better understand and influence the internal dynamics of large language models (LLMs). However, current methods face fundamental challenges in reliably localizing and manipulating feature representations. Sparse Autoencoders (SAEs) have recently emerged as a promising direction for feature extraction at sca… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  2. arXiv:2505.00022  [pdf, other

    cs.CL cs.AI cs.LG

    Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation

    Authors: Thomas F Burns, Letitia Parcalabescu, Stephan Wäldchen, Michael Barlow, Gregor Ziegltrum, Volker Stampa, Bastian Harren, Björn Deiseroth

    Abstract: Scaling data quantity is essential for large language models (LLMs), yet recent findings show that data quality can significantly boost performance and training efficiency. We introduce a German-language dataset curation pipeline that combines heuristic and model-based filtering techniques with synthetic data generation. We use our pipeline to create Aleph-Alpha-GermanWeb, a large-scale German pre… ▽ More

    Submitted 23 May, 2025; v1 submitted 24 April, 2025; originally announced May 2025.

    Comments: 10 pages, 3 figures

  3. arXiv:2306.04505  [pdf, other

    cs.LG cs.AI cs.CC cs.CR

    Hardness of Deceptive Certificate Selection

    Authors: Stephan Wäldchen

    Abstract: Recent progress towards theoretical interpretability guarantees for AI has been made with classifiers that are based on interactive proof systems. A prover selects a certificate from the datapoint and sends it to a verifier who decides the class. In the context of machine learning, such a certificate can be a feature that is informative of the class. For a setup with high soundness and completenes… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 15 pages, 3 figures

    MSC Class: 68T01; 91A06 ACM Class: I.2.0

  4. arXiv:2206.00759  [pdf, other

    cs.LG cs.AI

    Interpretability Guarantees with Merlin-Arthur Classifiers

    Authors: Stephan Wäldchen, Kartikey Sharma, Berkant Turan, Max Zimmer, Sebastian Pokutta

    Abstract: We propose an interactive multi-agent classifier that provides provable interpretability guarantees even for complex agents such as neural networks. These guarantees consist of lower bounds on the mutual information between selected features and the classification decision. Our results are inspired by the Merlin-Arthur protocol from Interactive Proof Systems and express these bounds in terms of me… ▽ More

    Submitted 22 March, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: AISTATS24 Camera-Ready Version, 34 pages total (9 pages main part, 3 pages references, 22 pages appendix), 17 figures, 3 tables

    MSC Class: 68T01; 91A06 ACM Class: I.2.0

  5. arXiv:2202.11797  [pdf, other

    cs.LG

    Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four

    Authors: Stephan Wäldchen, Felix Huber, Sebastian Pokutta

    Abstract: One of the goals of Explainable AI (XAI) is to determine which input components were relevant for a classifier decision. This is commonly know as saliency attribution. Characteristic functions (from cooperative game theory) are able to evaluate partial inputs and form the basis for theoretically "fair" attribution methods like Shapley values. Given only a standard classifier function, it is unclea… ▽ More

    Submitted 25 February, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: 19 pages, 9 figures, 1 table

    MSC Class: 68T01 (Primary) 91A12 (Secondary) ACM Class: I.2.0

  6. arXiv:2112.06532  [pdf, ps, other

    cs.LG math.PR math.ST

    A Complete Characterisation of ReLU-Invariant Distributions

    Authors: Jan Macdonald, Stephan Wäldchen

    Abstract: We give a complete characterisation of families of probability distributions that are invariant under the action of ReLU neural network layers. The need for such families arises during the training of Bayesian networks or the analysis of trained neural networks, e.g., in the context of uncertainty quantification (UQ) or explainable artificial intelligence (XAI). We prove that no invariant parametr… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 39 pages, 9 Figures

  7. arXiv:1905.11092  [pdf, other

    cs.LG cs.CC cs.IT stat.ML

    A Rate-Distortion Framework for Explaining Neural Network Decisions

    Authors: Jan Macdonald, Stephan Wäldchen, Sascha Hauch, Gitta Kutyniok

    Abstract: We formalise the widespread idea of interpreting neural network decisions as an explicit optimisation problem in a rate-distortion framework. A set of input features is deemed relevant for a classification decision if the expected classifier score remains nearly constant when randomising the remaining features. We discuss the computational complexity of finding small sets of relevant features and… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  8. arXiv:1905.09163  [pdf, ps, other

    cs.CC

    The Computational Complexity of Understanding Network Decisions

    Authors: Stephan Wäldchen, Jan Macdonald, Sascha Hauch, Gitta Kutyniok

    Abstract: For a Boolean function $Φ\colon\{0,1\}^d\to\{0,1\}$ and an assignment to its variables $\mathbf{x}=(x_1, x_2, \dots, x_d)$ we consider the problem of finding the subsets of the variables that are sufficient to determine the function value with a given probability $δ$. This is motivated by the task of interpreting predictions of binary classifiers described as Boolean circuits (which can be seen as… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: added acknowledgements, added a reference

    MSC Class: 68Q25; 68Q17 ACM Class: F.2.0

  9. arXiv:1902.10178  [pdf, other

    cs.AI cs.CV cs.LG cs.NE stat.ML

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Authors: Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller

    Abstract: Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighte… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: Accepted for publication in Nature Communications