-
Safe and Certifiable AI Systems: Concepts, Challenges, and Lessons Learned
Authors:
Kajetan Schweighofer,
Barbara Brune,
Lukas Gruber,
Simon Schmid,
Alexander Aufreiter,
Andreas Gruber,
Thomas Doms,
Sebastian Eder,
Florian Mayer,
Xaver-Paul Stadlbauer,
Christoph Schwald,
Werner Zellinger,
Bernhard Nessler,
Sepp Hochreiter
Abstract:
There is an increasing adoption of artificial intelligence in safety-critical applications, yet practical schemes for certifying that AI systems are safe, lawful and socially acceptable remain scarce. This white paper presents the TÜV AUSTRIA Trusted AI framework an end-to-end audit catalog and methodology for assessing and certifying machine learning systems. The audit catalog has been in continu…
▽ More
There is an increasing adoption of artificial intelligence in safety-critical applications, yet practical schemes for certifying that AI systems are safe, lawful and socially acceptable remain scarce. This white paper presents the TÜV AUSTRIA Trusted AI framework an end-to-end audit catalog and methodology for assessing and certifying machine learning systems. The audit catalog has been in continuous development since 2019 in an ongoing collaboration with scientific partners. Building on three pillars - Secure Software Development, Functional Requirements, and Ethics & Data Privacy - the catalog translates the high-level obligations of the EU AI Act into specific, testable criteria. Its core concept of functional trustworthiness couples a statistically defined application domain with risk-based minimum performance requirements and statistical testing on independently sampled data, providing transparent and reproducible evidence of model quality in real-world settings. We provide an overview of the functional requirements that we assess, which are oriented on the lifecycle of an AI system. In addition, we share some lessons learned from the practical application of the audit catalog, highlighting common pitfalls we encountered, such as data leakage scenarios, inadequate domain definitions, neglect of biases, or a lack of distribution drift controls. We further discuss key aspects of certifying AI systems, such as robustness, algorithmic fairness, or post-certification requirements, outlining both our current conclusions and a roadmap for future research. In general, by aligning technical best practices with emerging European standards, the approach offers regulators, providers, and users a practical roadmap for legally compliant, functionally trustworthy, and certifiable AI systems.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
Parameterized Synthetic Text Generation with SimpleStories
Authors:
Lennart Finke,
Chandan Sreedhara,
Thomas Dooms,
Mat Allen,
Emerald Zhang,
Juan Diego Rodriguez,
Noa Nabeshima,
Thomas Marshall,
Dan Braun
Abstract:
We present SimpleStories, a large synthetic story dataset in simple language, consisting of 2 million samples each in English and Japanese. Through parameterizing prompts at multiple levels of abstraction, we achieve control over story characteristics at scale, inducing syntactic and semantic diversity. Ablations on a newly trained model suite show improved sample efficiency and model interpretabi…
▽ More
We present SimpleStories, a large synthetic story dataset in simple language, consisting of 2 million samples each in English and Japanese. Through parameterizing prompts at multiple levels of abstraction, we achieve control over story characteristics at scale, inducing syntactic and semantic diversity. Ablations on a newly trained model suite show improved sample efficiency and model interpretability compared to the TinyStories dataset. We open-source all constituent parts of model creation, hoping to enable novel ways to study the end-to-end training process. As a byproduct, we move the frontier regarding the fewest-parameter language model that outputs grammatical natural language.
△ Less
Submitted 30 May, 2025; v1 submitted 12 April, 2025;
originally announced April 2025.
-
Compositionality Unlocks Deep Interpretable Models
Authors:
Thomas Dooms,
Ward Gauderis,
Geraint A. Wiggins,
Jose Oramas
Abstract:
We propose $χ$-net, an intrinsically interpretable architecture combining the compositional multilinear structure of tensor networks with the expressivity and efficiency of deep neural networks. $χ$-nets retain equal accuracy compared to their baseline counterparts. Our novel, efficient diagonalisation algorithm, ODT, reveals linear low-rank structure in a multilayer SVHN model. We leverage this t…
▽ More
We propose $χ$-net, an intrinsically interpretable architecture combining the compositional multilinear structure of tensor networks with the expressivity and efficiency of deep neural networks. $χ$-nets retain equal accuracy compared to their baseline counterparts. Our novel, efficient diagonalisation algorithm, ODT, reveals linear low-rank structure in a multilayer SVHN model. We leverage this toward formal weight-based interpretability and model compression.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Tokenized SAEs: Disentangling SAE Reconstructions
Authors:
Thomas Dooms,
Daniel Wilhelm
Abstract:
Sparse auto-encoders (SAEs) have become a prevalent tool for interpreting language models' inner workings. However, it is unknown how tightly SAE features correspond to computationally important directions in the model. This work empirically shows that many RES-JB SAE features predominantly correspond to simple input statistics. We hypothesize this is caused by a large class imbalance in training…
▽ More
Sparse auto-encoders (SAEs) have become a prevalent tool for interpreting language models' inner workings. However, it is unknown how tightly SAE features correspond to computationally important directions in the model. This work empirically shows that many RES-JB SAE features predominantly correspond to simple input statistics. We hypothesize this is caused by a large class imbalance in training data combined with a lack of complex error signals. To reduce this behavior, we propose a method that disentangles token reconstruction from feature reconstruction. This improvement is achieved by introducing a per-token bias, which provides an enhanced baseline for interesting reconstruction. As a result, significantly more interesting features and improved reconstruction in sparse regimes are learned.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Bilinear MLPs enable weight-based mechanistic interpretability
Authors:
Michael T. Pearce,
Thomas Dooms,
Alice Rigg,
Jose M. Oramas,
Lee Sharkey
Abstract:
A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct features. One challenge is that element-wise nonlinearities introduce higher-order interactions and make it difficult to trace computations through the…
▽ More
A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct features. One challenge is that element-wise nonlinearities introduce higher-order interactions and make it difficult to trace computations through the MLP layer. In this paper, we analyze bilinear MLPs, a type of Gated Linear Unit (GLU) without any element-wise nonlinearity that nevertheless achieves competitive performance. Bilinear MLPs can be fully expressed in terms of linear operations using a third-order tensor, allowing flexible analysis of the weights. Analyzing the spectra of bilinear MLP weights using eigendecomposition reveals interpretable low-rank structure across toy tasks, image classification, and language modeling. We use this understanding to craft adversarial examples, uncover overfitting, and identify small language model circuits directly from the weights alone. Our results demonstrate that bilinear layers serve as an interpretable drop-in replacement for current activation functions and that weight-based interpretability is viable for understanding deep-learning models.
△ Less
Submitted 25 June, 2025; v1 submitted 10 October, 2024;
originally announced October 2024.
-
Weight-based Decomposition: A Case for Bilinear MLPs
Authors:
Michael T. Pearce,
Thomas Dooms,
Alice Rigg
Abstract:
Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. An attractive quality of bilinear layers is that they can be fully expressed in terms of a third-order tensor and linear operations. Leveraging this, we develop a method to decompose the bilinear tensor…
▽ More
Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. An attractive quality of bilinear layers is that they can be fully expressed in terms of a third-order tensor and linear operations. Leveraging this, we develop a method to decompose the bilinear tensor into a set of sparsely interacting eigenvectors that show promising interpretability properties in preliminary experiments for shallow image classifiers (MNIST) and small language models (Tiny Stories). Since the decomposition is fully equivalent to the model's original computations, bilinear layers may be an interpretability-friendly architecture that helps connect features to the model weights. Application of our method may not be limited to pretrained bilinear models since we find that language models such as TinyLlama-1.1B can be finetuned into bilinear variants.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
The Trifecta: Three simple techniques for training deeper Forward-Forward networks
Authors:
Thomas Dooms,
Ing Jyh Tsang,
Jose Oramas
Abstract:
Modern machine learning models are able to outperform humans on a variety of non-trivial tasks. However, as the complexity of the models increases, they consume significant amounts of power and still struggle to generalize effectively to unseen data. Local learning, which focuses on updating subsets of a model's parameters at a time, has emerged as a promising technique to address these issues. Re…
▽ More
Modern machine learning models are able to outperform humans on a variety of non-trivial tasks. However, as the complexity of the models increases, they consume significant amounts of power and still struggle to generalize effectively to unseen data. Local learning, which focuses on updating subsets of a model's parameters at a time, has emerged as a promising technique to address these issues. Recently, a novel local learning algorithm, called Forward-Forward, has received widespread attention due to its innovative approach to learning. Unfortunately, its application has been limited to smaller datasets due to scalability issues. To this end, we propose The Trifecta, a collection of three simple techniques that synergize exceptionally well and drastically improve the Forward-Forward algorithm on deeper networks. Our experiments demonstrate that our models are on par with similarly structured, backpropagation-based models in both training speed and test accuracy on simple datasets. This is achieved by the ability to learn representations that are informative locally, on a layer-by-layer basis, and retain their informativeness when propagated to deeper layers in the architecture. This leads to around 84% accuracy on CIFAR-10, a notable improvement (25%) over the original FF algorithm. These results highlight the potential of Forward-Forward as a genuine competitor to backpropagation and as a promising research avenue.
△ Less
Submitted 12 December, 2023; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Functional trustworthiness of AI systems by statistically valid testing
Authors:
Bernhard Nessler,
Thomas Doms,
Sepp Hochreiter
Abstract:
The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act for the conformity assessment of AI systems. We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the posi…
▽ More
The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act for the conformity assessment of AI systems. We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways. Yet enacting a conformity assessment procedure that creates the false illusion of trust in insufficiently assessed AI systems is at best naive and at worst grossly negligent. The EU AI Act thus misses the point of ensuring quality by functional trustworthiness and correctly attributing responsibilities.
The trustworthiness of an AI decision system lies first and foremost in the correct statistical testing on randomly selected samples and in the precision of the definition of the application domain, which enables drawing samples in the first place. We will subsequently call this testable quality functional trustworthiness. It includes a design, development, and deployment that enables correct statistical testing of all relevant functions.
We are firmly convinced and advocate that a reliable assessment of the statistical functional properties of an AI system has to be the indispensable, mandatory nucleus of the conformity assessment. In this paper, we describe the three necessary elements to establish a reliable functional trustworthiness, i.e., (1) the definition of the technical distribution of the application, (2) the risk-based minimum performance requirements, and (3) the statistically valid testing based on independent random samples.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Trusted Artificial Intelligence: Towards Certification of Machine Learning Applications
Authors:
Philip Matthias Winter,
Sebastian Eder,
Johannes Weissenböck,
Christoph Schwald,
Thomas Doms,
Tom Vogt,
Sepp Hochreiter,
Bernhard Nessler
Abstract:
Artificial Intelligence is one of the fastest growing technologies of the 21st century and accompanies us in our daily lives when interacting with technical applications. However, reliance on such technical systems is crucial for their widespread applicability and acceptance. The societal tools to express reliance are usually formalized by lawful regulations, i.e., standards, norms, accreditations…
▽ More
Artificial Intelligence is one of the fastest growing technologies of the 21st century and accompanies us in our daily lives when interacting with technical applications. However, reliance on such technical systems is crucial for their widespread applicability and acceptance. The societal tools to express reliance are usually formalized by lawful regulations, i.e., standards, norms, accreditations, and certificates. Therefore, the TÜV AUSTRIA Group in cooperation with the Institute for Machine Learning at the Johannes Kepler University Linz, proposes a certification process and an audit catalog for Machine Learning applications. We are convinced that our approach can serve as the foundation for the certification of applications that use Machine Learning and Deep Learning, the techniques that drive the current revolution in Artificial Intelligence. While certain high-risk areas, such as fully autonomous robots in workspaces shared with humans, are still some time away from certification, we aim to cover low-risk applications with our certification procedure. Our holistic approach attempts to analyze Machine Learning applications from multiple perspectives to evaluate and verify the aspects of secure software development, functional requirements, data quality, data protection, and ethics. Inspired by existing work, we introduce four criticality levels to map the criticality of a Machine Learning application regarding the impact of its decisions on people, environment, and organizations. Currently, the audit catalog can be applied to low-risk applications within the scope of supervised learning as commonly encountered in industry. Guided by field experience, scientific developments, and market demands, the audit catalog will be extended and modified accordingly.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.