-
On the expressivity of deep Heaviside networks
Authors:
Insung Kong,
Juntong Chen,
Sophie Langer,
Johannes Schmidt-Hieber
Abstract:
We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network classes. As an application, we derive statistical convergence rates for DHN fits in the nonparametric regre…
▽ More
We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network classes. As an application, we derive statistical convergence rates for DHN fits in the nonparametric regression model.
△ Less
Submitted 30 April, 2025;
originally announced May 2025.
-
Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization
Authors:
Gabriel Clara,
Sophie Langer,
Johannes Schmidt-Hieber
Abstract:
We analyze the landscape and training dynamics of diagonal linear networks in a linear regression task, with the network parameters being perturbed by small isotropic normal noise. The addition of such noise may be interpreted as a stochastic form of sharpness-aware minimization (SAM) and we prove several results that relate its action on the underlying landscape and training dynamics to the sharp…
▽ More
We analyze the landscape and training dynamics of diagonal linear networks in a linear regression task, with the network parameters being perturbed by small isotropic normal noise. The addition of such noise may be interpreted as a stochastic form of sharpness-aware minimization (SAM) and we prove several results that relate its action on the underlying landscape and training dynamics to the sharpness of the loss. In particular, the noise changes the expected gradient to force balancing of the weight matrices at a fast rate along the descent trajectory. In the diagonal linear model, we show that this equates to minimizing the average sharpness, as well as the trace of the Hessian matrix, among all possible factorizations of the same matrix. Further, the noise forces the gradient descent iterates towards a shrinkage-thresholding of the underlying true parameter, with the noise level explicitly regulating both the shrinkage factor and the threshold.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Self-Supervised Radiograph Anatomical Region Classification -- How Clean Is Your Real-World Data?
Authors:
Simon Langer,
Jessica Ritter,
Rickmer Braren,
Daniel Rueckert,
Paul Hager
Abstract:
Modern deep learning-based clinical imaging workflows rely on accurate labels of the examined anatomical region. Knowing the anatomical region is required to select applicable downstream models and to effectively generate cohorts of high quality data for future medical and machine learning research efforts. However, this information may not be available in externally sourced data or generally cont…
▽ More
Modern deep learning-based clinical imaging workflows rely on accurate labels of the examined anatomical region. Knowing the anatomical region is required to select applicable downstream models and to effectively generate cohorts of high quality data for future medical and machine learning research efforts. However, this information may not be available in externally sourced data or generally contain data entry errors. To address this problem, we show the effectiveness of self-supervised methods such as SimCLR and BYOL as well as supervised contrastive deep learning methods in assigning one of 14 anatomical region classes in our in-house dataset of 48,434 skeletal radiographs. We achieve a strong linear evaluation accuracy of 96.6% with a single model and 97.7% using an ensemble approach. Furthermore, only a few labeled instances (1% of the training set) suffice to achieve an accuracy of 92.2%, enabling usage in low-label and thus low-resource scenarios. Our model can be used to correct data entry mistakes: a follow-up analysis of the test set errors of our best-performing single model by an expert radiologist identified 35% incorrect labels and 11% out-of-domain images. When accounted for, the radiograph anatomical region labelling performance increased -- without and with an ensemble, respectively -- to a theoretical accuracy of 98.0% and 98.8%.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
The impact of AI on engineering design procedures for dynamical systems
Authors:
Kristin M. de Payrebrune,
Kathrin Flaßkamp,
Tom Ströhla,
Thomas Sattel,
Dieter Bestle,
Benedict Röder,
Peter Eberhard,
Sebastian Peitz,
Marcus Stoffel,
Gulakala Rutwik,
Borse Aditya,
Meike Wohlleben,
Walter Sextro,
Maximilian Raff,
C. David Remy,
Manish Yadav,
Merten Stender,
Jan van Delden,
Timo Lüddecke,
Sabine C. Langer,
Julius Schultz,
Christopher Blech
Abstract:
Artificial intelligence (AI) is driving transformative changes across numerous fields, revolutionizing conventional processes and creating new opportunities for innovation. The development of mechatronic systems is undergoing a similar transformation. Over the past decade, modeling, simulation, and optimization techniques have become integral to the design process, paving the way for the adoption…
▽ More
Artificial intelligence (AI) is driving transformative changes across numerous fields, revolutionizing conventional processes and creating new opportunities for innovation. The development of mechatronic systems is undergoing a similar transformation. Over the past decade, modeling, simulation, and optimization techniques have become integral to the design process, paving the way for the adoption of AI-based methods. In this paper, we examine the potential for integrating AI into the engineering design process, using the V-model from the VDI guideline 2206, considered the state-of-the-art in product design, as a foundation. We identify and classify AI methods based on their suitability for specific stages within the engineering product design workflow. Furthermore, we present a series of application examples where AI-assisted design has been successfully implemented by the authors. These examples, drawn from research projects within the DFG Priority Program \emph{SPP~2353: Daring More Intelligence - Design Assistants in Mechanics and Dynamics}, showcase a diverse range of applications across mechanics and mechatronics, including areas such as acoustics and robotics.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications
Authors:
Monica Riedler,
Stefan Langer
Abstract:
Large Language Models (LLMs) have demonstrated impressive capabilities in answering questions, but they lack domain-specific knowledge and are prone to hallucinations. Retrieval Augmented Generation (RAG) is one approach to address these challenges, while multimodal models are emerging as promising AI assistants for processing both text and images. In this paper we describe a series of experiments…
▽ More
Large Language Models (LLMs) have demonstrated impressive capabilities in answering questions, but they lack domain-specific knowledge and are prone to hallucinations. Retrieval Augmented Generation (RAG) is one approach to address these challenges, while multimodal models are emerging as promising AI assistants for processing both text and images. In this paper we describe a series of experiments aimed at determining how to best integrate multimodal models into RAG systems for the industrial domain. The purpose of the experiments is to determine whether including images alongside text from documents within the industrial domain increases RAG performance and to find the optimal configuration for such a multimodal RAG system. Our experiments include two approaches for image processing and retrieval, as well as two LLMs (GPT4-Vision and LLaVA) for answer synthesis. These image processing strategies involve the use of multimodal embeddings and the generation of textual summaries from images. We evaluate our experiments with an LLM-as-a-Judge approach. Our results reveal that multimodal RAG can outperform single-modality RAG settings, although image retrieval poses a greater challenge than text retrieval. Additionally, leveraging textual summaries from images presents a more promising approach compared to the use of multimodal embeddings, providing more opportunities for future advancements.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
On the VC dimension of deep group convolutional neural networks
Authors:
Anna Sepliarskaia,
Sophie Langer,
Johannes Schmidt-Hieber
Abstract:
We study the generalization capabilities of Group Convolutional Neural Networks (GCNNs) with ReLU activation function by deriving upper and lower bounds for their Vapnik-Chervonenkis (VC) dimension. Specifically, we analyze how factors such as the number of layers, weights, and input dimension affect the VC dimension. We further compare the derived bounds to those known for other types of neural n…
▽ More
We study the generalization capabilities of Group Convolutional Neural Networks (GCNNs) with ReLU activation function by deriving upper and lower bounds for their Vapnik-Chervonenkis (VC) dimension. Specifically, we analyze how factors such as the number of layers, weights, and input dimension affect the VC dimension. We further compare the derived bounds to those known for other types of neural networks. Our findings extend previous results on the VC dimension of continuous GCNNs with two layers, thereby providing new insights into the generalization properties of GCNNs, particularly regarding the dependence on the input resolution of the data.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Efficient low rank model order reduction of vibroacoustic problems under stochastic loads
Authors:
Yannik Hüpel,
Ulrich Römer,
Matthias Bollhöfer,
Sabine Langer
Abstract:
This contribution combines a low-rank matrix approximation through Singular Value Decomposition (SVD) with second-order Krylov subspace-based Model Order Reduction (MOR), in order to efficiently propagate input uncertainties through a given vibroacoustic model. The vibroacoustic model consists of a plate coupled to a fluid into which the plate radiates sound due to a turbulent boundary layer excit…
▽ More
This contribution combines a low-rank matrix approximation through Singular Value Decomposition (SVD) with second-order Krylov subspace-based Model Order Reduction (MOR), in order to efficiently propagate input uncertainties through a given vibroacoustic model. The vibroacoustic model consists of a plate coupled to a fluid into which the plate radiates sound due to a turbulent boundary layer excitation. This excitation is subject to uncertainties due to the stochastic nature of the turbulence and the computational cost of simulating the coupled problem with stochastic forcing is very high. The proposed method approximates the output uncertainties in an efficient way, by reducing the evaluation cost of the model in terms of DOFs and samples by using the factors of the SVD low-rank approximation directly as input for the MOR algorithm. Here, the covariance matrix of the vector of unknowns can efficiently be approximated with only a fraction of the original number of evaluations. Therefore, the approach is a promising step to further reducing the computational effort of large-scale vibroacoustic evaluations.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature
Authors:
Stefan Langer,
Fabian Neuhaus,
Andreas Nürnberger
Abstract:
Ontologies are formal representations of knowledge in specific domains that provide a structured framework for organizing and understanding complex information. Creating ontologies, however, is a complex and time-consuming endeavor. ChEBI is a well-known ontology in the field of chemistry, which provides a comprehensive resource for defining chemical entities and their properties. However, it cove…
▽ More
Ontologies are formal representations of knowledge in specific domains that provide a structured framework for organizing and understanding complex information. Creating ontologies, however, is a complex and time-consuming endeavor. ChEBI is a well-known ontology in the field of chemistry, which provides a comprehensive resource for defining chemical entities and their properties. However, it covers only a small fraction of the rapidly growing knowledge in chemistry and does not provide references to the scientific literature. To address this, we propose a methodology that involves augmenting existing annotated text corpora with knowledge from Chebi and fine-tuning a large language model (LLM) to recognize chemical entities and their roles in scientific text. Our experiments demonstrate the effectiveness of our approach. By combining ontological knowledge and the language understanding capabilities of LLMs, we achieve high precision and recall rates in identifying both the chemical entities and roles in scientific literature. Furthermore, we extract them from a set of 8,000 ChemRxiv articles, and apply a second LLM to create a knowledge graph (KG) of chemical entities and roles (CEAR), which provides complementary information to ChEBI, and can help to extend it.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
FhGenie: A Custom, Confidentiality-preserving Chat AI for Corporate and Scientific Use
Authors:
Ingo Weber,
Hendrik Linka,
Daniel Mertens,
Tamara Muryshkin,
Heinrich Opgenoorth,
Stefan Langer
Abstract:
Since OpenAI's release of ChatGPT, generative AI has received significant attention across various domains. These AI-based chat systems have the potential to enhance the productivity of knowledge workers in diverse tasks. However, the use of free public services poses a risk of data leakage, as service providers may exploit user input for additional training and optimization without clear boundari…
▽ More
Since OpenAI's release of ChatGPT, generative AI has received significant attention across various domains. These AI-based chat systems have the potential to enhance the productivity of knowledge workers in diverse tasks. However, the use of free public services poses a risk of data leakage, as service providers may exploit user input for additional training and optimization without clear boundaries. Even subscription-based alternatives sometimes lack transparency in handling user data. To address these concerns and enable Fraunhofer staff to leverage this technology while ensuring confidentiality, we have designed and developed a customized chat AI called FhGenie (genie being a reference to a helpful spirit). Within few days of its release, thousands of Fraunhofer employees started using this service. As pioneers in implementing such a system, many other organizations have followed suit. Our solution builds upon commercial large language models (LLMs), which we have carefully integrated into our system to meet our specific requirements and compliance constraints, including confidentiality and GDPR. In this paper, we share detailed insights into the architectural considerations, design, implementation, and subsequent updates of FhGenie. Additionally, we discuss challenges, observations, and the core lessons learned from its productive usage.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
RIDGE: Reproducibility, Integrity, Dependability, Generalizability, and Efficiency Assessment of Medical Image Segmentation Models
Authors:
Farhad Maleki,
Linda Moy,
Reza Forghani,
Tapotosh Ghosh,
Katie Ovens,
Steve Langer,
Pouria Rouzrokh,
Bardia Khosravi,
Ali Ganjizadeh,
Daniel Warren,
Roxana Daneshjou,
Mana Moassefi,
Atlas Haddadi Avval,
Susan Sotardi,
Neil Tenenholtz,
Felipe Kitamura,
Timothy Kline
Abstract:
Deep learning techniques hold immense promise for advancing medical image analysis, particularly in tasks like image segmentation, where precise annotation of regions or volumes of interest within medical images is crucial but manually laborious and prone to interobserver and intraobserver biases. As such, deep learning approaches could provide automated solutions for such applications. However, t…
▽ More
Deep learning techniques hold immense promise for advancing medical image analysis, particularly in tasks like image segmentation, where precise annotation of regions or volumes of interest within medical images is crucial but manually laborious and prone to interobserver and intraobserver biases. As such, deep learning approaches could provide automated solutions for such applications. However, the potential of these techniques is often undermined by challenges in reproducibility and generalizability, which are key barriers to their clinical adoption. This paper introduces the RIDGE checklist, a comprehensive framework designed to assess the Reproducibility, Integrity, Dependability, Generalizability, and Efficiency of deep learning-based medical image segmentation models. The RIDGE checklist is not just a tool for evaluation but also a guideline for researchers striving to improve the quality and transparency of their work. By adhering to the principles outlined in the RIDGE checklist, researchers can ensure that their developed segmentation models are robust, scientifically valid, and applicable in a clinical setting.
△ Less
Submitted 3 July, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Learning to Predict Structural Vibrations
Authors:
Jan van Delden,
Julius Schultz,
Christopher Blech,
Sabine C. Langer,
Timo Lüddecke
Abstract:
In mechanical structures like airplanes, cars and houses, noise is generated and transmitted through vibrations. To take measures to reduce this noise, vibrations need to be simulated with expensive numerical computations. Deep learning surrogate models present a promising alternative to classical numerical simulations as they can be evaluated magnitudes faster, while trading-off accuracy. To quan…
▽ More
In mechanical structures like airplanes, cars and houses, noise is generated and transmitted through vibrations. To take measures to reduce this noise, vibrations need to be simulated with expensive numerical computations. Deep learning surrogate models present a promising alternative to classical numerical simulations as they can be evaluated magnitudes faster, while trading-off accuracy. To quantify such trade-offs systematically and foster the development of methods, we present a benchmark on the task of predicting the vibration of harmonically excited plates. The benchmark features a total of 12,000 plate geometries with varying forms of beadings, material, boundary conditions, load position and sizes with associated numerical solutions. To address the benchmark task, we propose a new network architecture, named Frequency-Query Operator, which predicts vibration patterns of plate geometries given a specific excitation frequency. Applying principles from operator learning and implicit models for shape encoding, our approach effectively addresses the prediction of highly variable frequency response functions occurring in dynamic systems. To quantify the prediction quality, we introduce a set of evaluation metrics and evaluate the method on our vibrating-plates benchmark. Our method outperforms DeepONets, Fourier Neural Operators and more traditional neural network architectures and can be used for design optimization. Code, dataset and visualizations: https://github.com/ecker-lab/Learning_Vibrating_Plates
△ Less
Submitted 3 December, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Efficient solution strategies for cabin noise assessment of a wave resolving aircraft fuselage model
Authors:
Christopher Blech,
Harikrishnan K. Sreekumar,
Yannik Hüpel,
Sabine C. Langer
Abstract:
For the purpose of high-fidelity aircraft cabin noise simulations during early design phases, we study three efficient solving approaches for the fully coupled finite element model of an aircraft fuselage segment. Obtaining an efficient solution with respect to consumed computational time and resources is challenging within a conventional simulation pipeline, as large-scale and complex vibroacoust…
▽ More
For the purpose of high-fidelity aircraft cabin noise simulations during early design phases, we study three efficient solving approaches for the fully coupled finite element model of an aircraft fuselage segment. Obtaining an efficient solution with respect to consumed computational time and resources is challenging within a conventional simulation pipeline, as large-scale and complex vibroacoustic models demand crucially high computational costs with increasing frequency. In this contribution, we adopt (1) frequency and domain-adaptive discretisation, (2) domain-decomposition techniques, and (3) model order reduction with rational Arnoldi Krylov subspace methods for an aircraft fuselage model. The three approaches have shown remarkable advantage thereby reducing the solving time as well as the memory requirement that are essential when solving large-scale models. While the discretisation and the model order reduction approaches accelerate the solving process by efficiently handling the complexity of the system to be solved, domain-decomposition techniques further handle the aspect of reducing the overall memory consumption. Finally with the help of active research aircraft models, we implement and showcase the achieved efficiency.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Learning Green's Function Efficiently Using Low-Rank Approximations
Authors:
Kishan Wimalawarne,
Taiji Suzuki,
Sophie Langer
Abstract:
Learning the Green's function using deep learning models enables to solve different classes of partial differential equations. A practical limitation of using deep learning for the Green's function is the repeated computationally expensive Monte-Carlo integral approximations. We propose to learn the Green's function by low-rank decomposition, which results in a novel architecture to remove redunda…
▽ More
Learning the Green's function using deep learning models enables to solve different classes of partial differential equations. A practical limitation of using deep learning for the Green's function is the repeated computationally expensive Monte-Carlo integral approximations. We propose to learn the Green's function by low-rank decomposition, which results in a novel architecture to remove redundant computations by separate learning with domain data for evaluation and Monte-Carlo samples for integral approximation. Using experiments we show that the proposed method improves computational time compared to MOD-Net while achieving comparable accuracy compared to both PINNs and MOD-Net.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
SEMPAI: a Self-Enhancing Multi-Photon Artificial Intelligence for prior-informed assessment of muscle function and pathology
Authors:
Alexander Mühlberg,
Paul Ritter,
Simon Langer,
Chloë Goossens,
Stefanie Nübler,
Dominik Schneidereit,
Oliver Taubmann,
Felix Denzinger,
Dominik Nörenberg,
Michael Haug,
Wolfgang H. Goldmann,
Andreas K. Maier,
Oliver Friedrich,
Lucas Kreiss
Abstract:
Deep learning (DL) shows notable success in biomedical studies. However, most DL algorithms work as a black box, exclude biomedical experts, and need extensive data. We introduce the Self-Enhancing Multi-Photon Artificial Intelligence (SEMPAI), that integrates hypothesis-driven priors in a data-driven DL approach for research on multiphoton microscopy (MPM) of muscle fibers. SEMPAI utilizes meta-l…
▽ More
Deep learning (DL) shows notable success in biomedical studies. However, most DL algorithms work as a black box, exclude biomedical experts, and need extensive data. We introduce the Self-Enhancing Multi-Photon Artificial Intelligence (SEMPAI), that integrates hypothesis-driven priors in a data-driven DL approach for research on multiphoton microscopy (MPM) of muscle fibers. SEMPAI utilizes meta-learning to optimize prior integration, data representation, and neural network architecture simultaneously. This allows hypothesis testing and provides interpretable feedback about the origin of biological information in MPM images. SEMPAI performs joint learning of several tasks to enable prediction for small datasets. The method is applied on an extensive multi-study dataset resulting in the largest joint analysis of pathologies and function for single muscle fibers. SEMPAI outperforms state-of-the-art biomarkers in six of seven predictive tasks, including those with scarce data. SEMPAI's DL models with integrated priors are superior to those without priors and to prior-only machine learning approaches.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Domain Adaptive Pretraining for Multilingual Acronym Extraction
Authors:
Usama Yaseen,
Stefan Langer
Abstract:
This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to furt…
▽ More
This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
DeepTechnome: Mitigating Unknown Bias in Deep Learning Based Assessment of CT Images
Authors:
Simon Langer,
Oliver Taubmann,
Felix Denzinger,
Andreas Maier,
Alexander Mühlberg
Abstract:
Reliably detecting diseases using relevant biological information is crucial for real-world applicability of deep learning techniques in medical imaging. We debias deep learning models during training against unknown bias - without preprocessing/filtering the input beforehand or assuming specific knowledge about its distribution or precise nature in the dataset. We use control regions as surrogate…
▽ More
Reliably detecting diseases using relevant biological information is crucial for real-world applicability of deep learning techniques in medical imaging. We debias deep learning models during training against unknown bias - without preprocessing/filtering the input beforehand or assuming specific knowledge about its distribution or precise nature in the dataset. We use control regions as surrogates that carry information regarding the bias, employ the classifier model to extract features, and suppress biased intermediate features with our custom, modular DecorreLayer. We evaluate our method on a dataset of 952 lung computed tomography scans by introducing simulated biases w.r.t. reconstruction kernel and noise level and propose including an adversarial test set in evaluations of bias reduction techniques. In a moderately sized model architecture, applying the proposed method to learn from data exhibiting a strong bias, it near-perfectly recovers the classification performance observed when training with corresponding unbiased data.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Best Practices and Scoring System on Reviewing A.I. based Medical Imaging Papers: Part 1 Classification
Authors:
Timothy L. Kline,
Felipe Kitamura,
Ian Pan,
Amine M. Korchi,
Neil Tenenholtz,
Linda Moy,
Judy Wawira Gichoya,
Igor Santos,
Steven Blumer,
Misha Ysabel Hwang,
Kim-Ann Git,
Abishek Shroff,
Elad Walach,
George Shih,
Steve Langer
Abstract:
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection i…
▽ More
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection is often subjective; however, reproducible science requires reproducible review. The Machine Learning Education Sub-Committee of SIIM has identified a knowledge gap and a serious need to establish guidelines for reviewing these studies. Although there have been several recent papers with this goal, this present work is written from the machine learning practitioners standpoint. In this series, the committee will address the best practices to be followed in an A.I.-based study and present the required sections in terms of examples and discussion of what should be included to make the studies cohesive, reproducible, accurate, and self-contained. This first entry in the series focuses on the task of image classification. Elements such as dataset curation, data pre-processing steps, defining an appropriate reference standard, data partitioning, model architecture and training are discussed. The sections are presented as they would be detailed in a typical manuscript, with content describing the necessary information that should be included to make sure the study is of sufficient quality to be considered for publication. The goal of this series is to provide resources to not only help improve the review process for A.I.-based medical imaging papers, but to facilitate a standard for the information that is presented within all components of the research study. We hope to provide quantitative metrics in what otherwise may be a qualitative review process.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Authors:
Kaustubh D. Dhole,
Varun Gangal,
Sebastian Gehrmann,
Aadesh Gupta,
Zhenhao Li,
Saad Mahamood,
Abinaya Mahendiran,
Simon Mille,
Ashish Shrivastava,
Samson Tan,
Tongshuang Wu,
Jascha Sohl-Dickstein,
Jinho D. Choi,
Eduard Hovy,
Ondrej Dusek,
Sebastian Ruder,
Sajant Anand,
Nagender Aneja,
Rabin Banjade,
Lisa Barthe,
Hanna Behnke,
Ian Berlot-Attwell,
Connor Boyle,
Caroline Brun,
Marco Antonio Sobrevilla Cabezudo
, et al. (101 additional authors not shown)
Abstract:
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split…
▽ More
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).
△ Less
Submitted 11 October, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
Simulated LiDAR Repositioning: a novel point cloud data augmentation method
Authors:
Xavier Morin-Duchesne,
Michael S Langer
Abstract:
We address a data augmentation problem for LiDAR. Given a LiDAR scan of a scene from some position, how can one simulate new scans of that scene from different, secondary positions? The method defines criteria for selecting valid secondary positions, and then estimates which points from the original point cloud would be acquired by a scanner from these positions. We validate the method using synth…
▽ More
We address a data augmentation problem for LiDAR. Given a LiDAR scan of a scene from some position, how can one simulate new scans of that scene from different, secondary positions? The method defines criteria for selecting valid secondary positions, and then estimates which points from the original point cloud would be acquired by a scanner from these positions. We validate the method using synthetic scenes, and examine how the similarity of generated point clouds depends on scanner distance, occlusion, and angular resolution. We show that the method is more accurate at short distances, and that having a high scanner resolution for the original point clouds has a strong impact on the similarity of generated point clouds. We also demonstrate how the method can be applied to natural scene statistics: in particular, we apply our method to reposition the scanner horizontally and vertically, separately consider points belonging to the ground and to non-ground objects, and describe the impact on the distributions of distances to these two classes of points.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
Data Augmentation for Low-Resource Named Entity Recognition Using Backtranslation
Authors:
Usama Yaseen,
Stefan Langer
Abstract:
The state of art natural language processing systems relies on sizable training datasets to achieve high performance. Lack of such datasets in the specialized low resource domains lead to suboptimal performance. In this work, we adapt backtranslation to generate high quality and linguistically diverse synthetic data for low-resource named entity recognition. We perform experiments on two datasets…
▽ More
The state of art natural language processing systems relies on sizable training datasets to achieve high performance. Lack of such datasets in the specialized low resource domains lead to suboptimal performance. In this work, we adapt backtranslation to generate high quality and linguistically diverse synthetic data for low-resource named entity recognition. We perform experiments on two datasets from the materials science (MaSciP) and biomedical domains (S800). The empirical results demonstrate the effectiveness of our proposed augmentation strategy, particularly in the low-resource scenario.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
Neural Text Classification and Stacked Heterogeneous Embeddings for Named Entity Recognition in SMM4H 2021
Authors:
Usama Yaseen,
Stefan Langer
Abstract:
This paper presents our findings from participating in the SMM4H Shared Task 2021. We addressed Named Entity Recognition (NER) and Text Classification. To address NER we explored BiLSTM-CRF with Stacked Heterogeneous Embeddings and linguistic features. We investigated various machine learning algorithms (logistic regression, Support Vector Machine (SVM) and Neural Networks) to address text classif…
▽ More
This paper presents our findings from participating in the SMM4H Shared Task 2021. We addressed Named Entity Recognition (NER) and Text Classification. To address NER we explored BiLSTM-CRF with Stacked Heterogeneous Embeddings and linguistic features. We investigated various machine learning algorithms (logistic regression, Support Vector Machine (SVM) and Neural Networks) to address text classification. Our proposed approaches can be generalized to different languages and we have shown its effectiveness for English and Spanish. Our text classification submissions (team:MIC-NLP) have achieved competitive performance with F1-score of $0.46$ and $0.90$ on ADE Classification (Task 1a) and Profession Classification (Task 7a) respectively. In the case of NER, our submissions scored F1-score of $0.50$ and $0.82$ on ADE Span Detection (Task 1b) and Profession Span detection (Task 7b) respectively.
△ Less
Submitted 11 June, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Combining Gesture and Voice Control for Mid-Air Manipulation of CAD Models in VR Environments
Authors:
Markus Friedrich,
Stefan Langer,
Fabian Frey
Abstract:
Modeling 3D objects in domains like Computer Aided Design (CAD) is time-consuming and comes with a steep learning curve needed to master the design process as well as tool complexities. In order to simplify the modeling process, we designed and implemented a prototypical system that leverages the strengths of Virtual Reality (VR) hand gesture recognition in combination with the expressiveness of a…
▽ More
Modeling 3D objects in domains like Computer Aided Design (CAD) is time-consuming and comes with a steep learning curve needed to master the design process as well as tool complexities. In order to simplify the modeling process, we designed and implemented a prototypical system that leverages the strengths of Virtual Reality (VR) hand gesture recognition in combination with the expressiveness of a voice-based interface for the task of 3D modeling. Furthermore, we use the Constructive Solid Geometry (CSG) tree representation for 3D models within the VR environment to let the user manipulate objects from the ground up, giving an intuitive understanding of how the underlying basic shapes connect. The system uses standard mid-air 3D object manipulation techniques and adds a set of voice commands to help mitigate the deficiencies of current hand gesture recognition techniques. A user study was conducted to evaluate the proposed prototype. The combination of our hybrid input paradigm shows to be a promising step towards easier to use CAD modeling.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Approximating smooth functions by deep neural networks with sigmoid activation function
Authors:
Sophie Langer
Abstract:
We study the power of deep neural networks (DNNs) with sigmoid activation function. Recently, it was shown that DNNs approximate any $d$-dimensional, smooth function on a compact set with a rate of order $W^{-p/d}$, where $W$ is the number of nonzero weights in the network and $p$ is the smoothness of the function. Unfortunately, these rates only hold for a special class of sparsely connected DNNs…
▽ More
We study the power of deep neural networks (DNNs) with sigmoid activation function. Recently, it was shown that DNNs approximate any $d$-dimensional, smooth function on a compact set with a rate of order $W^{-p/d}$, where $W$ is the number of nonzero weights in the network and $p$ is the smoothness of the function. Unfortunately, these rates only hold for a special class of sparsely connected DNNs. We ask ourselves if we can show the same approximation rate for a simpler and more general class, i.e., DNNs which are only defined by its width and depth. In this article we show that DNNs with fixed depth and a width of order $M^d$ achieve an approximation rate of $M^{-2p}$. As a conclusion we quantitatively characterize the approximation power of DNNs in terms of the overall weights $W_0$ in the network and show an approximation rate of $W_0^{-p/d}$. This more general result finally helps us to understand which network topology guarantees a special target accuracy.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
A Patient-Centric Dataset of Images and Metadata for Identifying Melanomas Using Clinical Context
Authors:
Veronica Rotemberg,
Nicholas Kurtansky,
Brigid Betz-Stablein,
Liam Caffery,
Emmanouil Chousakos,
Noel Codella,
Marc Combalia,
Stephen Dusza,
Pascale Guitera,
David Gutman,
Allan Halpern,
Harald Kittler,
Kivanc Kose,
Steve Langer,
Konstantinos Lioprys,
Josep Malvehy,
Shenara Musthaq,
Jabpani Nanda,
Ofer Reiter,
George Shih,
Alexander Stratigos,
Philipp Tschandl,
Jochen Weber,
H. Peter Soyer
Abstract:
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melan…
▽ More
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another. This patient-level contextual information is frequently used by clinicians to diagnose melanoma and is especially useful in ruling out false positives in patients with many atypical nevi. The dataset represents 2,056 patients from three continents with an average of 16 lesions per patient, consisting of 33,126 dermoscopic images and 584 histopathologically confirmed melanomas compared with benign melanoma mimickers.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
Content-based Recommendations for Radio Stations with Deep Learned Audio Fingerprints
Authors:
Stefan Langer,
Liza Obermeier,
André Ebert,
Markus Friedrich,
Emma Munisamy,
Claudia Linnhoff-Popien
Abstract:
The world of linear radio broadcasting is characterized by a wide variety of stations and played content. That is why finding stations playing the preferred content is a tough task for a potential listener, especially due to the overwhelming number of offered choices. Here, recommender systems usually step in but existing content-based approaches rely on metadata and thus are constrained by the av…
▽ More
The world of linear radio broadcasting is characterized by a wide variety of stations and played content. That is why finding stations playing the preferred content is a tough task for a potential listener, especially due to the overwhelming number of offered choices. Here, recommender systems usually step in but existing content-based approaches rely on metadata and thus are constrained by the available data quality. Other approaches leverage user behavior data and thus do not exploit any domain-specific knowledge and are furthermore disadvantageous regarding privacy concerns. Therefore, we propose a new pipeline for the generation of audio-based radio station fingerprints relying on audio stream crawling and a Deep Autoencoder. We show that the proposed fingerprints are especially useful for characterizing radio stations by their audio content and thus are an excellent representation for meaningful and reliable radio station recommendations. Furthermore, the proposed modules are part of the HRADIO Communication Platform, which enables hybrid radio features to radio stations. It is released with a flexible open source license and enables especially small- and medium-sized businesses, to provide customized and high quality radio services to potential listeners.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Enabling Machine Learning-Ready HPC Ensembles with Merlin
Authors:
J. Luc Peterson,
Ben Bay,
Joe Koning,
Peter Robinson,
Jessica Semler,
Jeremy White,
Rushil Anirudh,
Kevin Athey,
Peer-Timo Bremer,
Francesco Di Natale,
David Fox,
Jim A. Gaffney,
Sam A. Jacobs,
Bhavya Kailkhura,
Bogdan Kustowski,
Steven Langer,
Brian Spears,
Jayaraman Thiagarajan,
Brian Van Essen,
Jae-Seung Yeom
Abstract:
With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computin…
▽ More
With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computing (HPC) environment. In this paper, we present Merlin, a workflow framework to enable large ML-friendly ensembles of scientific HPC simulations. By augmenting traditional HPC with distributed compute technologies, Merlin aims to lower the barrier for scientific subject matter experts to incorporate ML into their analysis. In addition to its design, we describe some example applications that Merlin has enabled on leadership-class HPC resources, such as the ML-augmented optimization of nuclear fusion experiments and the calibration of infectious disease models to study the progression of and possible mitigation strategies for COVID-19.
△ Less
Submitted 1 July, 2021; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Estimation of a function of low local dimensionality by deep neural networks
Authors:
Michael Kohler,
Adam Krzyzak,
Sophie Langer
Abstract:
Deep neural networks (DNNs) achieve impressive results for complicated tasks like object detection on images and speech recognition. Motivated by this practical success, there is now a strong interest in showing good theoretical properties of DNNs. To describe for which tasks DNNs perform well and when they fail, it is a key challenge to understand their performance. The aim of this paper is to co…
▽ More
Deep neural networks (DNNs) achieve impressive results for complicated tasks like object detection on images and speech recognition. Motivated by this practical success, there is now a strong interest in showing good theoretical properties of DNNs. To describe for which tasks DNNs perform well and when they fail, it is a key challenge to understand their performance. The aim of this paper is to contribute to the current statistical theory of DNNs. We apply DNNs on high dimensional data and we show that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality. Consequently, the rate of convergence of the estimate does not depend on its input dimension $d$, but on its local dimension $d^*$ and the DNNs are able to circumvent the curse of dimensionality in case that $d^*$ is much smaller than $d$. In our simulation study we provide numerical experiments to support our theoretical result and we compare our estimate with other conventional nonparametric regression estimates. The performance of our estimates is also validated in experiments with real data.
△ Less
Submitted 15 June, 2020; v1 submitted 29 August, 2019;
originally announced August 2019.
-
On the rate of convergence of fully connected very deep neural network regression estimates
Authors:
Michael Kohler,
Sophie Langer
Abstract:
Recent results in nonparametric regression show that deep learning, i.e., neural network estimates with many hidden layers, are able to circumvent the so-called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. One key feature of the neural networks used in these results is that their network architecture has a further constraint, namely t…
▽ More
Recent results in nonparametric regression show that deep learning, i.e., neural network estimates with many hidden layers, are able to circumvent the so-called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. One key feature of the neural networks used in these results is that their network architecture has a further constraint, namely the network sparsity. In this paper we show that we can get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions. Here either the number of neurons per hidden layer is fixed and the number of hidden layers tends to infinity suitably fast for sample size tending to infinity, or the number of hidden layers is bounded by some logarithmic factor in the sample size and the number of neurons per hidden layer tends to infinity suitably fast for sample size tending to infinity. The proof is based on new approximation results concerning deep neural networks.
△ Less
Submitted 29 September, 2020; v1 submitted 29 August, 2019;
originally announced August 2019.
-
Difficulty Classification of Mountainbike Downhill Trails utilizing Deep Neural Networks
Authors:
Stefan Langer,
Robert Müller,
Kyrill Schmid,
Claudia Linnhoff-Popien
Abstract:
The difficulty of mountainbike downhill trails is a subjective perception. However, sports-associations and mountainbike park operators attempt to group trails into different levels of difficulty with scales like the Singletrail-Skala (S0-S5) or colored scales (blue, red, black, ...) as proposed by The International Mountain Bicycling Association. Inconsistencies in difficulty grading occur due to…
▽ More
The difficulty of mountainbike downhill trails is a subjective perception. However, sports-associations and mountainbike park operators attempt to group trails into different levels of difficulty with scales like the Singletrail-Skala (S0-S5) or colored scales (blue, red, black, ...) as proposed by The International Mountain Bicycling Association. Inconsistencies in difficulty grading occur due to the various scales, different people grading the trails, differences in topography, and more. We propose an end-to-end deep learning approach to classify trails into three difficulties easy, medium, and hard by using sensor data. With mbientlab Meta Motion r0.2 sensor units, we record accelerometer- and gyroscope data of one rider on multiple trail segments. A 2D convolutional neural network is trained with a stacked and concatenated representation of the aforementioned data as its input. We run experiments with five different sample- and five different kernel sizes and achieve a maximum Sparse Categorical Accuracy of 0.9097. To the best of our knowledge, this is the first work targeting computational difficulty classification of mountainbike downhill trails.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Soccer Team Vectors
Authors:
Robert Müller,
Stefan Langer,
Fabian Ritz,
Christoph Roch,
Steffen Illium,
Claudia Linnhoff-Popien
Abstract:
In this work we present STEVE - Soccer TEam VEctors, a principled approach for learning real valued vectors for soccer teams where similar teams are close to each other in the resulting vector space. STEVE only relies on freely available information about the matches teams played in the past. These vectors can serve as input to various machine learning tasks. Evaluating on the task of team market…
▽ More
In this work we present STEVE - Soccer TEam VEctors, a principled approach for learning real valued vectors for soccer teams where similar teams are close to each other in the resulting vector space. STEVE only relies on freely available information about the matches teams played in the past. These vectors can serve as input to various machine learning tasks. Evaluating on the task of team market value estimation, STEVE outperforms all its competitors. Moreover, we use STEVE for similarity search and to rank soccer teams.
△ Less
Submitted 31 March, 2020; v1 submitted 30 July, 2019;
originally announced August 2019.
-
Deep Neural Baselines for Computational Paralinguistics
Authors:
Daniel Elsner,
Stefan Langer,
Fabian Ritz,
Robert Müller,
Steffen Illium
Abstract:
Detecting sleepiness from spoken language is an ambitious task, which is addressed by the Interspeech 2019 Computational Paralinguistics Challenge (ComParE). We propose an end-to-end deep learning approach to detect and classify patterns reflecting sleepiness in the human voice. Our approach is based solely on a moderately complex deep neural network architecture. It may be applied directly on the…
▽ More
Detecting sleepiness from spoken language is an ambitious task, which is addressed by the Interspeech 2019 Computational Paralinguistics Challenge (ComParE). We propose an end-to-end deep learning approach to detect and classify patterns reflecting sleepiness in the human voice. Our approach is based solely on a moderately complex deep neural network architecture. It may be applied directly on the audio data without requiring any specific feature engineering, thus remaining transferable to other audio classification tasks. Nevertheless, our approach performs similar to state-of-the-art machine learning models.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Learning crystal plasticity using digital image correlation: Examples from discrete dislocation dynamics
Authors:
Stefanos Papanikolaou,
Michail Tzimas,
Andrew C. E. Reid,
Stephen A. Langer
Abstract:
Digital image correlation (DIC) is a well-established, non-invasive technique for tracking and quantifying the deformation of mechanical samples under strain. While it provides an obvious way to observe incremental and aggregate displacement information, it seems likely that DIC data sets, which after all reflect the spatially-resolved response of a microstructure to loads, contain much richer inf…
▽ More
Digital image correlation (DIC) is a well-established, non-invasive technique for tracking and quantifying the deformation of mechanical samples under strain. While it provides an obvious way to observe incremental and aggregate displacement information, it seems likely that DIC data sets, which after all reflect the spatially-resolved response of a microstructure to loads, contain much richer information than has generally been extracted from them. In this paper, we demonstrate a machine-learning approach to quantifying the prior deformation history of a crystalline sample based on its response to a subsequent DIC test. This prior deformation history is encoded in the microstructure through the inhomogeneity of the dislocation microstructure, and in the spatial correlations of the dislocation patterns, which mediate the system's response to the DIC test load. Our domain consists of deformed crystalline thin films generated by a discrete dislocation plasticity simulation. We explore the range of applicability of machine learning (ML) for typical experimental protocols, and as a function of possible size effects and stochasticity. Plasticity size effects may directly influence the data, rendering unsupervised techniques unable to distinguish different plasticity regimes.
△ Less
Submitted 13 April, 2019; v1 submitted 24 September, 2017;
originally announced September 2017.
-
Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned
Authors:
Stefan Langer,
Joeran Beel
Abstract:
For the past few years, we used Apache Lucene as recommendation frame-work in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above 0…
▽ More
For the past few years, we used Apache Lucene as recommendation frame-work in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above 0.025. Second, by picking ten recommendations randomly from Lucene's top50 search results, click-through rate decreased by 15%, compared to recommending the top10 results. Third, the number of returned search results tend to predict how high click-through rates will be: when Lucene returns less than 1,000 search results, click-through rates tend to be around half as high as if 1,000+ results are returned.
△ Less
Submitted 26 March, 2017;
originally announced March 2017.