-
Towards transparent and data-driven fault detection in manufacturing: A case study on univariate, discrete time series
Authors:
Bernd Hofmann,
Patrick Bruendl,
Huong Giang Nguyen,
Joerg Franke
Abstract:
Ensuring consistent product quality in modern manufacturing is crucial, particularly in safety-critical applications. Conventional quality control approaches, reliant on manually defined thresholds and features, lack adaptability to the complexity and variability inherent in production data and necessitate extensive domain expertise. Conversely, data-driven methods, such as machine learning, demon…
▽ More
Ensuring consistent product quality in modern manufacturing is crucial, particularly in safety-critical applications. Conventional quality control approaches, reliant on manually defined thresholds and features, lack adaptability to the complexity and variability inherent in production data and necessitate extensive domain expertise. Conversely, data-driven methods, such as machine learning, demonstrate high detection performance but typically function as black-box models, thereby limiting their acceptance in industrial environments where interpretability is paramount. This paper introduces a methodology for industrial fault detection, which is both data-driven and transparent. The approach integrates a supervised machine learning model for multi-class fault classification, Shapley Additive Explanations for post-hoc interpretability, and a do-main-specific visualisation technique that maps model explanations to operator-interpretable features. Furthermore, the study proposes an evaluation methodology that assesses model explanations through quantitative perturbation analysis and evaluates visualisations by qualitative expert assessment. The approach was applied to the crimping process, a safety-critical joining technique, using a dataset of univariate, discrete time series. The system achieves a fault detection accuracy of 95.9 %, and both quantitative selectivity analysis and qualitative expert evaluations confirmed the relevance and inter-pretability of the generated explanations. This human-centric approach is designed to enhance trust and interpretability in data-driven fault detection, thereby contributing to applied system design in industrial quality control.
△ Less
Submitted 30 June, 2025;
originally announced July 2025.
-
Learning in Compact Spaces with Approximately Normalized Transformers
Authors:
Jörg K. H. Franke,
Urs Spiegelhalter,
Marianna Nezhurina,
Jenia Jitsev,
Frank Hutter,
Michael Hefenbrock
Abstract:
In deep learning, regularization and normalization are common solutions for challenges such as overfitting, numerical instabilities, and the increasing variance in the residual stream. An alternative approach is to force all parameters and representations to lie on a hypersphere. This removes the need for regularization and increases convergence speed, but comes with additional costs. In this work…
▽ More
In deep learning, regularization and normalization are common solutions for challenges such as overfitting, numerical instabilities, and the increasing variance in the residual stream. An alternative approach is to force all parameters and representations to lie on a hypersphere. This removes the need for regularization and increases convergence speed, but comes with additional costs. In this work, we propose a more holistic but approximate normalization (anTransformer). Our approach constrains the norm of parameters and normalizes all representations via scalar multiplications motivated by the tight concentration of the norms of high-dimensional random vectors. When applied to GPT training, we observe a 40% faster convergence compared to models with QK normalization, with less than 3% additional runtime. Deriving scaling laws for anGPT, we found our method enables training with larger batch sizes and fewer hyperparameters, while matching the favorable scaling characteristics of classic GPT architectures.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Accelerating db-A* for Kinodynamic Motion Planning Using Diffusion
Authors:
Julius Franke,
Akmaral Moldagalieva,
Pia Hanfeld,
Wolfgang Hönig
Abstract:
We present a novel approach for generating motion primitives for kinodynamic motion planning using diffusion models. The motions generated by our approach are adapted to each problem instance by utilizing problem-specific parameters, allowing for finding solutions faster and of better quality. The diffusion models used in our approach are trained on randomly cut solution trajectories. These trajec…
▽ More
We present a novel approach for generating motion primitives for kinodynamic motion planning using diffusion models. The motions generated by our approach are adapted to each problem instance by utilizing problem-specific parameters, allowing for finding solutions faster and of better quality. The diffusion models used in our approach are trained on randomly cut solution trajectories. These trajectories are created by solving randomly generated problem instances with a kinodynamic motion planner. Experimental results show significant improvements up to 30 percent in both computation time and solution quality across varying robot dynamics such as second-order unicycle or car with trailer.
△ Less
Submitted 10 March, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Authors:
Riccardo Grazzi,
Julien Siems,
Arber Zela,
Jörg K. H. Franke,
Frank Hutter,
Massimiliano Pontil
Abstract:
Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers for long sequences. However, both Transformers and LRNNs struggle to perform state-tracking, which may impair performance in tasks such as code evaluation. In one forward pass, current architectures are unable to solve even parity, the simplest state-trackin…
▽ More
Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers for long sequences. However, both Transformers and LRNNs struggle to perform state-tracking, which may impair performance in tasks such as code evaluation. In one forward pass, current architectures are unable to solve even parity, the simplest state-tracking task, which non-linear RNNs can handle effectively. Recently, Sarrof et al. (2024) demonstrated that the failure of LRNNs like Mamba to solve parity stems from restricting the value range of their diagonal state-transition matrices to $[0, 1]$ and that incorporating negative values can resolve this issue. We extend this result to non-diagonal LRNNs such as DeltaNet. We prove that finite precision LRNNs with state-transition matrices having only positive eigenvalues cannot solve parity, while non-triangular matrices are needed to count modulo $3$. Notably, we also prove that LRNNs can learn any regular language when their state-transition matrices are products of identity minus vector outer product matrices, each with eigenvalues in the range $[-1, 1]$. Our experiments confirm that extending the eigenvalue range of Mamba and DeltaNet to include negative values not only enables them to solve parity but consistently improves their performance on state-tracking tasks. We also show that state-tracking enabled LRNNs can be pretrained stably and efficiently at scale (1.3B parameters), achieving competitive performance on language modeling and showing promise on code and math tasks.
△ Less
Submitted 18 March, 2025; v1 submitted 19 November, 2024;
originally announced November 2024.
-
Transfer Learning for Finetuning Large Language Models
Authors:
Tobias Strangmann,
Lennart Purucker,
Jörg K. H. Franke,
Ivo Rapant,
Fabio Ferreira,
Frank Hutter
Abstract:
As the landscape of large language models expands, efficiently finetuning for specific tasks becomes increasingly crucial. At the same time, the landscape of parameter-efficient finetuning methods rapidly expands. Consequently, practitioners face a multitude of complex choices when searching for an optimal finetuning pipeline for large language models. To reduce the complexity for practitioners, w…
▽ More
As the landscape of large language models expands, efficiently finetuning for specific tasks becomes increasingly crucial. At the same time, the landscape of parameter-efficient finetuning methods rapidly expands. Consequently, practitioners face a multitude of complex choices when searching for an optimal finetuning pipeline for large language models. To reduce the complexity for practitioners, we investigate transfer learning for finetuning large language models and aim to transfer knowledge about configurations from related finetuning tasks to a new task. In this work, we transfer learn finetuning by meta-learning performance and cost surrogate models for grey-box meta-optimization from a new meta-dataset. Counter-intuitively, we propose to rely only on transfer learning for new datasets. Thus, we do not use task-specific Bayesian optimization but prioritize knowledge transferred from related tasks over task-specific feedback. We evaluate our method on eight synthetic question-answer datasets and a meta-dataset consisting of 1,800 runs of finetuning Microsoft's Phi-3. Our transfer learning is superior to zero-shot, default finetuning, and meta-optimization baselines. Our results demonstrate the transferability of finetuning to adapt large language models more effectively.
△ Less
Submitted 2 November, 2024;
originally announced November 2024.
-
MyoGestic: EMG Interfacing Framework for Decoding Multiple Spared Degrees of Freedom of the Hand in Individuals with Neural Lesions
Authors:
Raul C. Sîmpetru,
Dominik I. Braun,
Arndt U. Simon,
Michael März,
Vlad Cnejevici,
Daniela Souza de Oliveira,
Nico Weber,
Jonas Walter,
Jörg Franke,
Daniel Höglinger,
Cosima Prahm,
Matthias Ponfick,
Alessandro Del Vecchio
Abstract:
Restoring limb motor function in individuals with spinal cord injury (SCI), stroke, or amputation remains a critical challenge, one which affects millions worldwide. Recent studies show through surface electromyography (EMG) that spared motor neurons can still be voluntarily controlled, even without visible limb movement . These signals can be decoded and used for motor intent estimation; however,…
▽ More
Restoring limb motor function in individuals with spinal cord injury (SCI), stroke, or amputation remains a critical challenge, one which affects millions worldwide. Recent studies show through surface electromyography (EMG) that spared motor neurons can still be voluntarily controlled, even without visible limb movement . These signals can be decoded and used for motor intent estimation; however, current wearable solutions lack the necessary hardware and software for intuitive interfacing of the spared degrees of freedom after neural injuries. To address these limitations, we developed a wireless, high-density EMG bracelet, coupled with a novel software framework, MyoGestic. Our system allows rapid and tailored adaptability of machine learning models to the needs of the users, facilitating real-time decoding of multiple spared distinctive degrees of freedom. In our study, we successfully decoded the motor intent from two participants with SCI, two with spinal stroke , and three amputees in real-time, achieving several controllable degrees of freedom within minutes after wearing the EMG bracelet. We provide a proof-of-concept that these decoded signals can be used to control a digitally rendered hand, a wearable orthosis, a prosthesis, or a 2D cursor. Our framework promotes a participant-centered approach, allowing immediate feedback integration, thus enhancing the iterative development of myocontrol algorithms. The proposed open-source software framework, MyoGestic, allows researchers and patients to focus on the augmentation and training of the spared degrees of freedom after neural lesions, thus potentially bridging the gap between research and clinical application and advancing the development of intuitive EMG interfaces for diverse neural lesions.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Fast Optimizer Benchmark
Authors:
Simon Blauth,
Tobias Bürger,
Zacharias Häringer,
Jörg Franke,
Frank Hutter
Abstract:
In this paper, we present the Fast Optimizer Benchmark (FOB), a tool designed for evaluating deep learning optimizers during their development. The benchmark supports tasks from multiple domains such as computer vision, natural language processing, and graph learning. The focus is on convenient usage, featuring human-readable YAML configurations, SLURM integration, and plotting utilities. FOB can…
▽ More
In this paper, we present the Fast Optimizer Benchmark (FOB), a tool designed for evaluating deep learning optimizers during their development. The benchmark supports tasks from multiple domains such as computer vision, natural language processing, and graph learning. The focus is on convenient usage, featuring human-readable YAML configurations, SLURM integration, and plotting utilities. FOB can be used together with existing hyperparameter optimization (HPO) tools as it handles training and resuming of runs. The modular design enables integration into custom pipelines, using it simply as a collection of tasks. We showcase an optimizer comparison as a usage example of our tool. FOB can be found on GitHub: https://github.com/automl/FOB.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
Authors:
Rhea Sanjay Sukthanker,
Arber Zela,
Benedikt Staffler,
Aaron Klein,
Lennart Purucker,
Joerg K. H. Franke,
Frank Hutter
Abstract:
The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training a…
▽ More
The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training and evaluation on multiple devices. To address this, we introduce HW-GPT-Bench, a hardware-aware benchmark that utilizes surrogate predictions to approximate various hardware metrics across 13 devices of architectures in the GPT-2 family, with architectures containing up to 1.55B parameters. Our surrogates, via calibrated predictions and reliable uncertainty estimates, faithfully model the heteroscedastic noise inherent in the energy and latency measurements. To estimate perplexity, we employ weight-sharing techniques from Neural Architecture Search (NAS), inheriting pretrained weights from the largest GPT-2 model. Finally, we demonstrate the utility of HW-GPT-Bench by simulating optimization trajectories of various multi-objective optimization algorithms in just a few seconds.
△ Less
Submitted 3 November, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Rethinking Performance Measures of RNA Secondary Structure Problems
Authors:
Frederic Runge,
Jörg K. H. Franke,
Daniel Fertmann,
Frank Hutter
Abstract:
Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 scor…
▽ More
Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 score, MCC) have limitations. We propose the Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing graph-based metrics like WL enables fair and accurate evaluation of RNA structure prediction algorithms. Further, WL provides informative guidance, as demonstrated in an RNA design experiment.
△ Less
Submitted 4 December, 2023;
originally announced January 2024.
-
Non-Sequential Ensemble Kalman Filtering using Distributed Arrays
Authors:
Cédric Travelletti,
Jörg Franke,
David Ginsbourger,
Stefan Brönnimann
Abstract:
This work introduces a new, distributed implementation of the Ensemble Kalman Filter (EnKF) that allows for non-sequential assimilation of large datasets in high-dimensional problems. The traditional EnKF algorithm is computationally intensive and exhibits difficulties in applications requiring interaction with the background covariance matrix, prompting the use of methods like sequential assimila…
▽ More
This work introduces a new, distributed implementation of the Ensemble Kalman Filter (EnKF) that allows for non-sequential assimilation of large datasets in high-dimensional problems. The traditional EnKF algorithm is computationally intensive and exhibits difficulties in applications requiring interaction with the background covariance matrix, prompting the use of methods like sequential assimilation which can introduce unwanted consequences, such as dependency on observation ordering. Our implementation leverages recent advancements in distributed computing to enable the construction and use of the full model error covariance matrix in distributed memory, allowing for single-batch assimilation of all observations and eliminating order dependencies. Comparative performance assessments, involving both synthetic and real-world paleoclimatic reconstruction applications, indicate that the new, non-sequential implementation outperforms the traditional, sequential one.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Improving Deep Learning Optimization through Constrained Parameter Regularization
Authors:
Jörg K. H. Franke,
Michael Hefenbrock,
Gregor Koehler,
Frank Hutter
Abstract:
Regularization is a critical component in deep learning. The most commonly used approach, weight decay, applies a constant penalty coefficient uniformly across all parameters. This may be overly restrictive for some parameters, while insufficient for others. To address this, we present Constrained Parameter Regularization (CPR) as an alternative to traditional weight decay. Unlike the uniform appl…
▽ More
Regularization is a critical component in deep learning. The most commonly used approach, weight decay, applies a constant penalty coefficient uniformly across all parameters. This may be overly restrictive for some parameters, while insufficient for others. To address this, we present Constrained Parameter Regularization (CPR) as an alternative to traditional weight decay. Unlike the uniform application of a single penalty, CPR enforces an upper bound on a statistical measure, such as the L2-norm, of individual parameter matrices. Consequently, learning becomes a constraint optimization problem, which we tackle using an adaptation of the augmented Lagrangian method. CPR introduces only a minor runtime overhead and only requires setting an upper bound. We propose simple yet efficient mechanisms for initializing this bound, making CPR rely on no hyperparameter or one, akin to weight decay. Our empirical studies on computer vision and language modeling tasks demonstrate CPR's effectiveness. The results show that CPR can outperform traditional weight decay and increase performance in pre-training and fine-tuning.
△ Less
Submitted 7 December, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Beyond Random Augmentations: Pretraining with Hard Views
Authors:
Fabio Ferreira,
Ivo Rapant,
Jörg K. H. Franke,
Frank Hutter
Abstract:
Self-Supervised Learning (SSL) methods typically rely on random image augmentations, or views, to make models invariant to different transformations. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple yet effective approach is to select hard views that yield…
▽ More
Self-Supervised Learning (SSL) methods typically rely on random image augmentations, or views, to make models invariant to different transformations. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple yet effective approach is to select hard views that yield a higher loss. In this paper, we propose Hard View Pretraining (HVP), a learning-free strategy that extends random view generation by exposing models to more challenging samples during SSL pretraining. HVP encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss according to the current model state, and 4) perform a backward pass with the selected pair. In contrast to existing hard view literature, we are the first to demonstrate hard view pretraining's effectiveness at scale, particularly training on the full ImageNet-1k dataset, and evaluating across multiple SSL methods, ConvNets, and ViTs. As a result, HVP sets a new state-of-the-art on DINO ViT-B/16, reaching 78.8% linear evaluation accuracy (a 0.6% improvement) and consistent gains of 1% for both 100 and 300 epoch pretraining, with similar improvements across transfer tasks in DINO, SimSiam, iBOT, and SimCLR.
△ Less
Submitted 6 February, 2025; v1 submitted 5 October, 2023;
originally announced October 2023.
-
RecycleNet: Latent Feature Recycling Leads to Iterative Decision Refinement
Authors:
Gregor Koehler,
Tassilo Wald,
Constantin Ulrich,
David Zimmerer,
Paul F. Jaeger,
Jörg K. H. Franke,
Simon Kohl,
Fabian Isensee,
Klaus H. Maier-Hein
Abstract:
Despite the remarkable success of deep learning systems over the last decade, a key difference still remains between neural network and human decision-making: As humans, we cannot only form a decision on the spot, but also ponder, revisiting an initial guess from different angles, distilling relevant information, arriving at a better decision. Here, we propose RecycleNet, a latent feature recyclin…
▽ More
Despite the remarkable success of deep learning systems over the last decade, a key difference still remains between neural network and human decision-making: As humans, we cannot only form a decision on the spot, but also ponder, revisiting an initial guess from different angles, distilling relevant information, arriving at a better decision. Here, we propose RecycleNet, a latent feature recycling method, instilling the pondering capability for neural networks to refine initial decisions over a number of recycling steps, where outputs are fed back into earlier network layers in an iterative fashion. This approach makes minimal assumptions about the neural network architecture and thus can be implemented in a wide variety of contexts. Using medical image segmentation as the evaluation environment, we show that latent feature recycling enables the network to iteratively refine initial predictions even beyond the iterations seen during training, converging towards an improved decision. We evaluate this across a variety of segmentation benchmarks and show consistent improvements even compared with top-performing segmentation methods. This allows trading increased computation time for improved performance, which can be beneficial, especially for safety-critical applications.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Scalable Deep Learning for RNA Secondary Structure Prediction
Authors:
Jörg K. H. Franke,
Frederic Runge,
Frank Hutter
Abstract:
The field of RNA secondary structure prediction has made significant progress with the adoption of deep learning techniques. In this work, we present the RNAformer, a lean deep learning model using axial attention and recycling in the latent space. We gain performance improvements by designing the architecture for modeling the adjacency matrix directly in the latent space and by scaling the size o…
▽ More
The field of RNA secondary structure prediction has made significant progress with the adoption of deep learning techniques. In this work, we present the RNAformer, a lean deep learning model using axial attention and recycling in the latent space. We gain performance improvements by designing the architecture for modeling the adjacency matrix directly in the latent space and by scaling the size of the model. Our approach achieves state-of-the-art performance on the popular TS0 benchmark dataset and even outperforms methods that use external information. Further, we show experimentally that the RNAformer can learn a biophysical model of the RNA folding process.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Towards Automated Design of Riboswitches
Authors:
Frederic Runge,
Jörg K. H. Franke,
Frank Hutter
Abstract:
Experimental screening and selection pipelines for the discovery of novel riboswitches are expensive, time-consuming, and inefficient. Using computational methods to reduce the number of candidates for the screen could drastically decrease these costs. However, existing computational approaches do not fully satisfy all requirements for the design of such initial screening libraries. In this work,…
▽ More
Experimental screening and selection pipelines for the discovery of novel riboswitches are expensive, time-consuming, and inefficient. Using computational methods to reduce the number of candidates for the screen could drastically decrease these costs. However, existing computational approaches do not fully satisfy all requirements for the design of such initial screening libraries. In this work, we present a new method, libLEARNA, capable of providing RNA focus libraries of diverse variable-length qualified candidates. Our novel structure-based design approach considers global properties as well as desired sequence and structure features. We demonstrate the benefits of our method by designing theophylline riboswitch libraries, following a previously published protocol, and yielding 30% more unique high-quality candidates.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Insight into cloud processes from unsupervised classification with a rotationally invariant autoencoder
Authors:
Takuya Kurihana,
James Franke,
Ian Foster,
Ziwei Wang,
Elisabeth Moyer
Abstract:
Clouds play a critical role in the Earth's energy budget and their potential changes are one of the largest uncertainties in future climate projections. However, the use of satellite observations to understand cloud feedbacks in a warming climate has been hampered by the simplicity of existing cloud classification schemes, which are based on single-pixel cloud properties rather than utilizing spat…
▽ More
Clouds play a critical role in the Earth's energy budget and their potential changes are one of the largest uncertainties in future climate projections. However, the use of satellite observations to understand cloud feedbacks in a warming climate has been hampered by the simplicity of existing cloud classification schemes, which are based on single-pixel cloud properties rather than utilizing spatial structures and textures. Recent advances in computer vision enable the grouping of different patterns of images without using human-predefined labels, providing a novel means of automated cloud classification. This unsupervised learning approach allows discovery of unknown climate-relevant cloud patterns, and the automated processing of large datasets. We describe here the use of such methods to generate a new AI-driven Cloud Classification Atlas (AICCA), which leverages 22 years and 800 terabytes of MODIS satellite observations over the global ocean. We use a rotation-invariant cloud clustering (RICC) method to classify those observations into 42 AI-generated cloud class labels at ~100 km spatial resolution. As a case study, we use AICCA to examine a recent finding of decreasing cloudiness in a critical part of the subtropical stratocumulus deck, and show that the change is accompanied by strong trends in cloud classes.
△ Less
Submitted 20 November, 2022; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design
Authors:
Jörg K. H. Franke,
Frederic Runge,
Frank Hutter
Abstract:
Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structure…
▽ More
Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures. This suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions. We show the benefits of our approach (1) on a synthetic task that captures the ability to learn a hidden data distribution, (2) with state-of-the-art results in RNA folding that reveal advantages on highly ambiguous data, and (3) demonstrating its generative capabilities on property-based molecule design by implicitly learning the underlying distributions and outperforming existing work.
△ Less
Submitted 14 November, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Practitioner Motives to Use Different Hyperparameter Optimization Methods
Authors:
Niclas Kannengießer,
Niklas Hasebrook,
Felix Morsbach,
Marc-André Zöller,
Jörg Franke,
Marius Lindauer,
Frank Hutter,
Ali Sunyaev
Abstract:
Programmatic hyperparameter optimization (HPO) methods, such as Bayesian optimization and evolutionary algorithms, are highly sample-efficient in identifying optimal hyperparameter configurations for machine learning (ML) models. However, practitioners frequently use less efficient methods, such as grid search, which can lead to under-optimized models. We suspect this behavior is driven by a range…
▽ More
Programmatic hyperparameter optimization (HPO) methods, such as Bayesian optimization and evolutionary algorithms, are highly sample-efficient in identifying optimal hyperparameter configurations for machine learning (ML) models. However, practitioners frequently use less efficient methods, such as grid search, which can lead to under-optimized models. We suspect this behavior is driven by a range of practitioner-specific motives. Practitioner motives, however, still need to be clarified to enhance user-centered development of HPO tools. To uncover practitioner motives to use different HPO methods, we conducted 20 semi-structured interviews and an online survey with 49 ML experts. By presenting main goals (e.g., increase ML model understanding) and contextual factors affecting practitioners' selection of HPO methods (e.g., available computer resources), this study offers a conceptual foundation to better understand why practitioners use different HPO methods, supporting development of more user-centered and context-adaptive HPO tools in automated ML.
△ Less
Submitted 16 May, 2025; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Automatic Plane Adjustment of Orthopedic Intra-operative Flat Panel Detector CT-Volumes
Authors:
Celia Martin Vicario,
Florian Kordon,
Felix Denzinger,
Jan Siad El Barbari,
Maxim Privalov,
Jochen Franke,
Sarina Thomas,
Lisa Kausch,
Andreas Maier,
Holger Kunze
Abstract:
Purpose
3D acquisitions are often acquired to assess the result in orthopedic trauma surgery. With a mobile C-Arm system, these acquisitions can be performed intra-operatively. That reduces the number of required revision surgeries. However, due to the operation room setup, the acquisitions typically cannot be performed such that the acquired volumes are aligned to the anatomical regions. Thus,…
▽ More
Purpose
3D acquisitions are often acquired to assess the result in orthopedic trauma surgery. With a mobile C-Arm system, these acquisitions can be performed intra-operatively. That reduces the number of required revision surgeries. However, due to the operation room setup, the acquisitions typically cannot be performed such that the acquired volumes are aligned to the anatomical regions. Thus, the multiplanar reconstructed (MPR) planes need to be adjusted manually during the review of the volume. In this paper, we present a detailed study of multi-task learning (MTL) regression networks to estimate the parameters of the MPR planes.
Approach
First, various mathematical descriptions for rotation, including Euler angle, quaternion, and matrix representation, are revised. Then, three different MTL network architectures based on the PoseNet are compared with a single task learning network.
Results
Using a matrix description rather than the Euler angle description, the accuracy of the regressed normals improves from $7.7^{\circ}$ to $7.3^{\circ}$ in the mean value for single anatomies. The multi-head approach improves the regression of the plane position from $7.4mm$ to $6.1mm$, while the orientation does not benefit from this approach.
Conclusions
The results show that a multi-head approach can lead to slightly better results than the individual tasks networks. The most important benefit of the MTL approach is that it is a single network for standard plane regression for all body regions with a reduced number of stored parameters.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Computational timbre and tonal system similarity analysis of the music of Northern Myanmar-based Kachin compared to Xinjiang-based Uyghur ethnic groups
Authors:
Rolf Bader,
Michael Blaß,
Jonas Franke
Abstract:
The music of Northern Myanmar Kachin ethnic group is compared to the music of western China, Xijiang based Uyghur music, using timbre and pitch feature extraction and machine learning. Although separated by Tibet, the muqam tradition of Xinjiang might be found in Kachin music due to myths of Kachin origin, as well as linguistic similarities, e.g., the Kachin term 'makan' for a musical piece. Extra…
▽ More
The music of Northern Myanmar Kachin ethnic group is compared to the music of western China, Xijiang based Uyghur music, using timbre and pitch feature extraction and machine learning. Although separated by Tibet, the muqam tradition of Xinjiang might be found in Kachin music due to myths of Kachin origin, as well as linguistic similarities, e.g., the Kachin term 'makan' for a musical piece. Extractions were performed using the apollon and COMSAR (Computational Music and Sound Archiving) frameworks, on which the Ethnographic Sound Recordings Archive (ESRA) is based, using ethnographic recordings from ESRA next to additional pieces. In terms of pitch, tonal systems were compared using Kohonen self-organizing map (SOM), which clearly clusters Kachin and Uyghur musical pieces. This is mainly caused by the Xinjiang muqam music showing just fifth and fourth, while Kachin pieces tend to have a higher fifth and fourth, next to other dissimilarities. Also, the timbre features of spectral centroid and spectral sharpness standard deviation clearly tells Uyghur from Kachin pieces, where Uyghur music shows much larger deviations. Although more features will be compared in the future, like rhythm or melody, these already strong findings might introduce an alternative comparison methodology of ethnic groups beyond traditional linguistic definitions.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Hyperparameter Transfer Across Developer Adjustments
Authors:
Danny Stoll,
Jörg K. H. Franke,
Diane Wagner,
Simon Selg,
Frank Hutter
Abstract:
After developer adjustments to a machine learning (ML) algorithm, how can the results of an old hyperparameter optimization (HPO) automatically be used to speedup a new HPO? This question poses a challenging problem, as developer adjustments can change which hyperparameter settings perform well, or even the hyperparameter search space itself. While many approaches exist that leverage knowledge obt…
▽ More
After developer adjustments to a machine learning (ML) algorithm, how can the results of an old hyperparameter optimization (HPO) automatically be used to speedup a new HPO? This question poses a challenging problem, as developer adjustments can change which hyperparameter settings perform well, or even the hyperparameter search space itself. While many approaches exist that leverage knowledge obtained on previous tasks, so far, knowledge from previous development steps remains entirely untapped. In this work, we remedy this situation and propose a new research framework: hyperparameter transfer across adjustments (HT-AA). To lay a solid foundation for this research framework, we provide four simple HT-AA baseline algorithms and eight benchmarks changing various aspects of ML algorithms, their hyperparameter search spaces, and the neural architectures used. The best baseline, on average and depending on the budgets for the old and new HPO, reaches a given performance 1.2--2.6x faster than a prominent HPO algorithm without transfer. As HPO is a crucial step in ML development but requires extensive computational resources, this speedup would lead to faster development cycles, lower costs, and reduced environmental impacts. To make these benefits available to ML developers off-the-shelf and to facilitate future research on HT-AA, we provide python packages for our baselines and benchmarks.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Sample-Efficient Automated Deep Reinforcement Learning
Authors:
Jörg K. H. Franke,
Gregor Köhler,
André Biedenkapp,
Frank Hutter
Abstract:
Despite significant progress in challenging problems across various domains, applying state-of-the-art deep reinforcement learning (RL) algorithms remains challenging due to their sensitivity to the choice of hyperparameters. This sensitivity can partly be attributed to the non-stationarity of the RL problem, potentially requiring different hyperparameter settings at various stages of the learning…
▽ More
Despite significant progress in challenging problems across various domains, applying state-of-the-art deep reinforcement learning (RL) algorithms remains challenging due to their sensitivity to the choice of hyperparameters. This sensitivity can partly be attributed to the non-stationarity of the RL problem, potentially requiring different hyperparameter settings at various stages of the learning process. Additionally, in the RL setting, hyperparameter optimization (HPO) requires a large number of environment interactions, hindering the transfer of the successes in RL to real-world applications. In this work, we tackle the issues of sample-efficient and dynamic HPO in RL. We propose a population-based automated RL (AutoRL) framework to meta-optimize arbitrary off-policy RL algorithms. In this framework, we optimize the hyperparameters and also the neural architecture while simultaneously training the agent. By sharing the collected experience across the population, we substantially increase the sample efficiency of the meta-optimization. We demonstrate the capabilities of our sample-efficient AutoRL approach in a case study with the popular TD3 algorithm in the MuJoCo benchmark suite, where we reduce the number of environment interactions needed for meta-optimization by up to an order of magnitude compared to population-based training.
△ Less
Submitted 17 March, 2021; v1 submitted 3 September, 2020;
originally announced September 2020.
-
Neural Architecture Evolution in Deep Reinforcement Learning for Continuous Control
Authors:
Jörg K. H. Franke,
Gregor Köhler,
Noor Awad,
Frank Hutter
Abstract:
Current Deep Reinforcement Learning algorithms still heavily rely on handcrafted neural network architectures. We propose a novel approach to automatically find strong topologies for continuous control tasks while only adding a minor overhead in terms of interactions in the environment. To achieve this, we combine Neuroevolution techniques with off-policy training and propose a novel architecture…
▽ More
Current Deep Reinforcement Learning algorithms still heavily rely on handcrafted neural network architectures. We propose a novel approach to automatically find strong topologies for continuous control tasks while only adding a minor overhead in terms of interactions in the environment. To achieve this, we combine Neuroevolution techniques with off-policy training and propose a novel architecture mutation operator. Experiments on five continuous control benchmarks show that the proposed Actor-Critic Neuroevolution algorithm often outperforms the strong Actor-Critic baseline and is capable of automatically finding topologies in a sample-efficient manner which would otherwise have to be found by expensive architecture search.
△ Less
Submitted 27 February, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Multi-task Localization and Segmentation for X-ray Guided Planning in Knee Surgery
Authors:
Florian Kordon,
Peter Fischer,
Maxim Privalov,
Benedict Swartman,
Marc Schnetzke,
Jochen Franke,
Ruxandra Lasowski,
Andreas Maier,
Holger Kunze
Abstract:
X-ray based measurement and guidance are commonly used tools in orthopaedic surgery to facilitate a minimally invasive workflow. Typically, a surgical planning is first performed using knowledge of bone morphology and anatomical landmarks. Information about bone location then serves as a prior for registration during overlay of the planning on intra-operative X-ray images. Performing these steps m…
▽ More
X-ray based measurement and guidance are commonly used tools in orthopaedic surgery to facilitate a minimally invasive workflow. Typically, a surgical planning is first performed using knowledge of bone morphology and anatomical landmarks. Information about bone location then serves as a prior for registration during overlay of the planning on intra-operative X-ray images. Performing these steps manually however is prone to intra-rater/inter-rater variability and increases task complexity for the surgeon. To remedy these issues, we propose an automatic framework for planning and subsequent overlay. We evaluate it on the example of femoral drill site planning for medial patellofemoral ligament reconstruction surgery. A deep multi-task stacked hourglass network is trained on 149 conventional lateral X-ray images to jointly localize two femoral landmarks, to predict a region of interest for the posterior femoral cortex tangent line, and to perform semantic segmentation of the femur, patella, tibia, and fibula with adaptive task complexity weighting. On 38 clinical test images the framework achieves a median localization error of 1.50 mm for the femoral drill site and mean IOU scores of 0.99, 0.97, 0.98, and 0.96 for the femur, patella, tibia, and fibula respectively. The demonstrated approach consistently performs surgical planning at expert-level precision without the need for manual correction.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Robust and Scalable Differentiable Neural Computer for Question Answering
Authors:
Jörg Franke,
Jan Niehues,
Alex Waibel
Abstract:
Deep learning models are often not easily adaptable to new tasks and require task-specific adjustments. The differentiable neural computer (DNC), a memory-augmented neural network, is designed as a general problem solver which can be used in a wide range of tasks. But in reality, it is hard to apply this model to new tasks. We analyze the DNC and identify possible improvements within the applicati…
▽ More
Deep learning models are often not easily adaptable to new tasks and require task-specific adjustments. The differentiable neural computer (DNC), a memory-augmented neural network, is designed as a general problem solver which can be used in a wide range of tasks. But in reality, it is hard to apply this model to new tasks. We analyze the DNC and identify possible improvements within the application of question answering. This motivates a more robust and scalable DNC (rsDNC). The objective precondition is to keep the general character of this model intact while making its application more reliable and speeding up its required training time. The rsDNC is distinguished by a more robust training, a slim memory unit and a bidirectional architecture. We not only achieve new state-of-the-art performance on the bAbI task, but also minimize the performance variance between different initializations. Furthermore, we demonstrate the simplified applicability of the rsDNC to new tasks with passable results on the CNN RC task without adaptions.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.