-
Kronecker-factored Approximate Curvature (KFAC) From Scratch
Authors:
Felix Dangel,
Bálint Mucsányi,
Tobias Weber,
Runa Eschenhagen
Abstract:
Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence functions, and model compression or merging. While the intuition behind KFAC is easy to understand, its implementation is tedious: It comes in many flavours, has…
▽ More
Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence functions, and model compression or merging. While the intuition behind KFAC is easy to understand, its implementation is tedious: It comes in many flavours, has common pitfalls when translating the math to code, and is challenging to test, which complicates ensuring a properly functioning implementation. Some of the authors themselves have dealt with these challenges and experienced the discomfort of not being able to fully test their code. Thanks to recent advances in understanding KFAC, we are now able to provide test cases and a recipe for a reliable KFAC implementation. This tutorial is meant as a ground-up introduction to KFAC. In contrast to the existing work, our focus lies on providing both math and code side-by-side and providing test cases based on the latest insights into KFAC that are scattered throughout the literature. We hope this tutorial provides a contemporary view of KFAC that allows beginners to gain a deeper understanding of this curvature approximation while lowering the barrier to its implementation, extension, and usage in practice.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Unraveling Media Perspectives: A Comprehensive Methodology Combining Large Language Models, Topic Modeling, Sentiment Analysis, and Ontology Learning to Analyse Media Bias
Authors:
Orlando Jähde,
Thorsten Weber,
Rüdiger Buchkremer
Abstract:
Biased news reporting poses a significant threat to informed decision-making and the functioning of democracies. This study introduces a novel methodology for scalable, minimally biased analysis of media bias in political news. The proposed approach examines event selection, labeling, word choice, and commission and omission biases across news sources by leveraging natural language processing tech…
▽ More
Biased news reporting poses a significant threat to informed decision-making and the functioning of democracies. This study introduces a novel methodology for scalable, minimally biased analysis of media bias in political news. The proposed approach examines event selection, labeling, word choice, and commission and omission biases across news sources by leveraging natural language processing techniques, including hierarchical topic modeling, sentiment analysis, and ontology learning with large language models. Through three case studies related to current political events, we demonstrate the methodology's effectiveness in identifying biases across news sources at various levels of granularity. This work represents a significant step towards scalable, minimally biased media bias analysis, laying the groundwork for tools to help news consumers navigate an increasingly complex media landscape.
△ Less
Submitted 3 May, 2025;
originally announced May 2025.
-
Sound and Complete Invariant-Based Heap Encodings (Technical Report)
Authors:
Zafer Esen,
Philipp Rümmer,
Tjark Weber
Abstract:
Verification of programs operating on mutable, heap-allocated data structures poses significant challenges due to potentially unbounded structures like linked lists and trees. In this paper, we present a novel relational heap encoding leveraging uninterpreted predicates and prophecy variables, reducing heap verification tasks to satisfiability checks over integers in constrained Horn clauses (CHCs…
▽ More
Verification of programs operating on mutable, heap-allocated data structures poses significant challenges due to potentially unbounded structures like linked lists and trees. In this paper, we present a novel relational heap encoding leveraging uninterpreted predicates and prophecy variables, reducing heap verification tasks to satisfiability checks over integers in constrained Horn clauses (CHCs). To the best of our knowledge, our approach is the first invariant-based method that achieves both soundness and completeness for heap-manipulating programs. We provide formal proofs establishing the correctness of our encodings. Through an experimental evaluation we demonstrate that our method significantly extends the capability of existing CHC-based verification tools, allowing automatic verification of programs with heap previously unreachable by state-of-the-art tools.
△ Less
Submitted 23 April, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
Explainability for Embedding AI: Aspirations and Actuality
Authors:
Thomas Weber
Abstract:
With artificial intelligence (AI) embedded in many everyday software systems, effectively and reliably developing and maintaining AI systems becomes an essential skill for software developers. However, the complexity inherent to AI poses new challenges. Explainable AI (XAI) may allow developers to understand better the systems they build, which, in turn, can help with tasks like debugging. In this…
▽ More
With artificial intelligence (AI) embedded in many everyday software systems, effectively and reliably developing and maintaining AI systems becomes an essential skill for software developers. However, the complexity inherent to AI poses new challenges. Explainable AI (XAI) may allow developers to understand better the systems they build, which, in turn, can help with tasks like debugging. In this paper, we report insights from a series of surveys with software developers that highlight that there is indeed an increased need for explanatory tools to support developers in creating AI systems. However, the feedback also indicates that existing XAI systems still fall short of this aspiration. Thus, we see an unmet need to provide developers with adequate support mechanisms to cope with this complexity so they can embed AI into high-quality software in the future.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Drums of high width
Authors:
Alex Davies,
Prateek Gupta,
Sebastien Racaniere,
Grzegorz Swirszcz,
Adam Zsolt Wagner,
Theophane Weber,
Geordie Williamson
Abstract:
We provide a family of $5$-dimensional prismatoids whose width grows linearly in the number of vertices. This provides a new infinite family of counter-examples to the Hirsch conjecture whose excess width grows linearly in the number of vertices, and answers a question of Matschke, Santos and Weibel.
We provide a family of $5$-dimensional prismatoids whose width grows linearly in the number of vertices. This provides a new infinite family of counter-examples to the Hirsch conjecture whose excess width grows linearly in the number of vertices, and answers a question of Matschke, Santos and Weibel.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model
Authors:
Yuzhi Lai,
Shenghai Yuan,
Youssef Nassar,
Mingyu Fan,
Thomas Weber,
Matthias Rätsch
Abstract:
Effective Human-Robot Interaction (HRI) is crucial for future service robots in aging societies. Existing solutions are biased toward only well-trained objects, creating a gap when dealing with new objects. Currently, HRI systems using predefined gestures or language tokens for pretrained objects pose challenges for all individuals, especially elderly ones. These challenges include difficulties in…
▽ More
Effective Human-Robot Interaction (HRI) is crucial for future service robots in aging societies. Existing solutions are biased toward only well-trained objects, creating a gap when dealing with new objects. Currently, HRI systems using predefined gestures or language tokens for pretrained objects pose challenges for all individuals, especially elderly ones. These challenges include difficulties in recalling commands, memorizing hand gestures, and learning new names. This paper introduces NVP-HRI, an intuitive multi-modal HRI paradigm that combines voice commands and deictic posture. NVP-HRI utilizes the Segment Anything Model (SAM) to analyze visual cues and depth data, enabling precise structural object representation. Through a pre-trained SAM network, NVP-HRI allows interaction with new objects via zero-shot prediction, even without prior knowledge. NVP-HRI also integrates with a large language model (LLM) for multimodal commands, coordinating them with object selection and scene distribution in real time for collision-free trajectory solutions. We also regulate the action sequence with the essential control syntax to reduce LLM hallucination risks. The evaluation of diverse real-world tasks using a Universal Robot showcased up to 59.2\% efficiency improvement over traditional gesture control, as illustrated in the video https://youtu.be/EbC7al2wiAc. Our code and design will be openly available at https://github.com/laiyuzhi/NVP-HRI.git.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Block Graph Neural Networks for tumor heterogeneity prediction
Authors:
Marianne Abémgnigni Njifon,
Tobias Weber,
Viktor Bezborodov,
Tyll Krueger,
Dominic Schuhmacher
Abstract:
Accurate tumor classification is essential for selecting effective treatments, but current methods have limitations. Standard tumor grading, which categorizes tumors based on cell differentiation, is not recommended as a stand-alone procedure, as some well-differentiated tumors can be malignant. Tumor heterogeneity assessment via single-cell sequencing offers profound insights but can be costly an…
▽ More
Accurate tumor classification is essential for selecting effective treatments, but current methods have limitations. Standard tumor grading, which categorizes tumors based on cell differentiation, is not recommended as a stand-alone procedure, as some well-differentiated tumors can be malignant. Tumor heterogeneity assessment via single-cell sequencing offers profound insights but can be costly and may still require significant manual intervention. Many existing statistical machine learning methods for tumor data still require complex pre-processing of MRI and histopathological data.
In this paper, we propose to build on a mathematical model that simulates tumor evolution (Ożański (2017)) and generate artificial datasets for tumor classification. Tumor heterogeneity is estimated using normalized entropy, with a threshold to classify tumors as having high or low heterogeneity. Our contributions are threefold: (1) the cut and graph generation processes from the artificial data, (2) the design of tumor features, and (3) the construction of Block Graph Neural Networks (BGNN), a Graph Neural Network-based approach to predict tumor heterogeneity. The experimental results reveal that the combination of the proposed features and models yields excellent results on artificially generated data ($89.67\%$ accuracy on the test data). In particular, in alignment with the emerging trends in AI-assisted grading and spatial transcriptomics, our results suggest that enriching traditional grading methods with birth (e.g., Ki-67 proliferation index) and death markers can improve heterogeneity prediction and enhance tumor classification.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Advancing Geometry with AI: Multi-agent Generation of Polytopes
Authors:
Grzegorz Swirszcz,
Adam Zsolt Wagner,
Geordie Williamson,
Sam Blackwell,
Bogdan Georgiev,
Alex Davies,
Ali Eslami,
Sebastien Racaniere,
Theophane Weber,
Pushmeet Kohli
Abstract:
Polytopes are one of the most primitive concepts underlying geometry. Discovery and study of polytopes with complex structures provides a means of advancing scientific knowledge. Construction of polytopes with specific extremal structure is very difficult and time-consuming. Having an automated tool for the generation of such extremal examples is therefore of great value. We present an Artificial…
▽ More
Polytopes are one of the most primitive concepts underlying geometry. Discovery and study of polytopes with complex structures provides a means of advancing scientific knowledge. Construction of polytopes with specific extremal structure is very difficult and time-consuming. Having an automated tool for the generation of such extremal examples is therefore of great value. We present an Artificial Intelligence system capable of generating novel polytopes with very high complexity, whose abilities we demonstrate in three different and challenging scenarios: the Hirsch Conjecture, the k-neighbourly problem and the longest monotone paths problem. For each of these three problems the system was able to generate novel examples, which match or surpass the best previously known bounds. Our main focus was the Hirsch Conjecture, which had remained an open problem for over 50 years. The highly parallel A.I. system presented in this paper was able to generate millions of examples, with many of them surpassing best known previous results and possessing properties not present in the earlier human-constructed examples. For comparison, it took leading human experts over 50 years to handcraft the first example of a polytope exceeding the bound conjectured by Hirsch, and in the decade since humans were able to construct only a scarce few families of such counterexample polytopes. With the adoption of computer-aided methods, the creation of new examples of mathematical objects stops being a domain reserved only for human expertise. Advances in A.I. provide mathematicians with yet another powerful tool in advancing mathematical knowledge. The results presented demonstrate that A.I. is capable of addressing problems in geometry recognized as extremely hard, and also to produce extremal examples different in nature from the ones constructed by humans.
△ Less
Submitted 30 January, 2025;
originally announced February 2025.
-
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Authors:
Chris Kolb,
Tobias Weber,
Bernd Bischl,
David Rügamer
Abstract:
Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the $L_1$ norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of…
▽ More
Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the $L_1$ norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of $L_1$-penalized neural networks by adding differentiable $L_2$ regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify important learning rate requirements necessary for training factorized networks. We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.
△ Less
Submitted 7 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor
Authors:
Zhikun Wu,
Thomas Weber,
Florian Müller
Abstract:
Collaboration has been shown to enhance creativity, leading to more innovative and effective outcomes. While previous research has explored the abilities of Large Language Models (LLMs) to serve as co-creative partners in tasks like writing poetry or creating narratives, the collaborative potential of LLMs in humor-rich and culturally nuanced domains remains an open question. To address this gap,…
▽ More
Collaboration has been shown to enhance creativity, leading to more innovative and effective outcomes. While previous research has explored the abilities of Large Language Models (LLMs) to serve as co-creative partners in tasks like writing poetry or creating narratives, the collaborative potential of LLMs in humor-rich and culturally nuanced domains remains an open question. To address this gap, we conducted a user study to explore the potential of LLMs in co-creating memes - a humor-driven and culturally specific form of creative expression. We conducted a user study with three groups of 50 participants each: a human-only group creating memes without AI assistance, a human-AI collaboration group interacting with a state-of-the-art LLM model, and an AI-only group where the LLM autonomously generated memes. We assessed the quality of the generated memes through crowdsourcing, with each meme rated on creativity, humor, and shareability. Our results showed that LLM assistance increased the number of ideas generated and reduced the effort participants felt. However, it did not improve the quality of the memes when humans collaborated with LLM. Interestingly, memes created entirely by AI performed better than both human-only and human-AI collaborative memes in all areas on average. However, when looking at the top-performing memes, human-created ones were better in humor, while human-AI collaborations stood out in creativity and shareability. These findings highlight the complexities of human-AI collaboration in creative tasks. While AI can boost productivity and create content that appeals to a broad audience, human creativity remains crucial for content that connects on a deeper level.
△ Less
Submitted 23 January, 2025; v1 submitted 20 January, 2025;
originally announced January 2025.
-
A Realistic Collimated X-Ray Image Simulation Pipeline
Authors:
Benjamin El-Zein,
Dominik Eckert,
Thomas Weber,
Maximilian Rohleder,
Ludwig Ritschl,
Steffen Kappler,
Andreas Maier
Abstract:
Collimator detection remains a challenging task in X-ray systems with unreliable or non-available information about the detectors position relative to the source. This paper presents a physically motivated image processing pipeline for simulating the characteristics of collimator shadows in X-ray images. By generating randomized labels for collimator shapes and locations, incorporating scattered r…
▽ More
Collimator detection remains a challenging task in X-ray systems with unreliable or non-available information about the detectors position relative to the source. This paper presents a physically motivated image processing pipeline for simulating the characteristics of collimator shadows in X-ray images. By generating randomized labels for collimator shapes and locations, incorporating scattered radiation simulation, and including Poisson noise, the pipeline enables the expansion of limited datasets for training deep neural networks. We validate the proposed pipeline by a qualitative and quantitative comparison against real collimator shadows. Furthermore, it is demonstrated that utilizing simulated data within our deep learning framework not only serves as a suitable substitute for actual collimators but also enhances the generalization performance when applied to real-world data.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Chattronics: using GPTs to assist in the design of data acquisition systems
Authors:
Jonathan Paul Driemeyer Brown,
Tiago Oliveira Weber
Abstract:
The usefulness of Large Language Models (LLM) is being continuously tested in various fields. However, their intrinsic linguistic characteristic is still one of the limiting factors when applying these models to exact sciences. In this article, a novel approach to use General Pre-Trained Transformers to assist in the design phase of data acquisition systems will be presented. The solution is packa…
▽ More
The usefulness of Large Language Models (LLM) is being continuously tested in various fields. However, their intrinsic linguistic characteristic is still one of the limiting factors when applying these models to exact sciences. In this article, a novel approach to use General Pre-Trained Transformers to assist in the design phase of data acquisition systems will be presented. The solution is packaged in the form of an application that retains the conversational aspects of LLMs, in such a manner that the user must provide details on the desired project in order for the model to draft both a system-level architectural diagram and the block-level specifications, following a Top-Down methodology based on restrictions. To test this tool, two distinct user emulations were used, one of which uses an additional GPT model. In total, 4 different data acquisition projects were used in the testing phase, each with its own measurement requirements: angular position, temperature, acceleration and a fourth project with both pressure and superficial temperature measurements. After 160 test iterations, the study concludes that there is potential for these models to serve adequately as synthesis/assistant tools for data acquisition systems, but there are still technological limitations. The results show coherent architectures and topologies, but that GPTs have difficulties in simultaneously considering all requirements and many times commits theoretical mistakes.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
A novel fusion of Sentinel-1 and Sentinel-2 with climate data for crop phenology estimation using Machine Learning
Authors:
Shahab Aldin Shojaeezadeh,
Abdelrazek Elnashar,
Tobias Karl David Weber
Abstract:
Crop phenology describes the physiological development stages of crops from planting to harvest which is valuable information for decision makers to plan and adapt agricultural management strategies. In the era of big Earth observation data ubiquity, attempts have been made to accurately detect crop phenology using Remote Sensing (RS) and high resolution weather data. However, most studies have fo…
▽ More
Crop phenology describes the physiological development stages of crops from planting to harvest which is valuable information for decision makers to plan and adapt agricultural management strategies. In the era of big Earth observation data ubiquity, attempts have been made to accurately detect crop phenology using Remote Sensing (RS) and high resolution weather data. However, most studies have focused on large scale predictions of phenology or developed methods which are not adequate to help crop modeler communities on leveraging Sentinel-1 and Sentinal-2 data and fusing them with high resolution climate data, using a novel framework. For this, we trained a Machine Learning (ML) LightGBM model to predict 13 phenological stages for eight major crops across Germany at 20 m scale. Observed phonologies were taken from German national phenology network (German Meteorological Service; DWD) between 2017 and 2021. We proposed a thorough feature selection analysis to find the best combination of RS and climate data to detect phenological stages. At national scale, predicted phenology resulted in a reasonable precision of R2 > 0.43 and a low Mean Absolute Error of 6 days, averaged over all phenological stages and crops. The spatio-temporal analysis of the model predictions demonstrates its transferability across different spatial and temporal context of Germany. The results indicated that combining radar sensors with climate data yields a very promising performance for a multitude of practical applications. Moreover, these improvements are expected to be useful to generate highly valuable input for crop model calibrations and evaluations, facilitate informed agricultural decisions, and contribute to sustainable food production to address the increasing global food demand.
△ Less
Submitted 12 May, 2025; v1 submitted 16 August, 2024;
originally announced September 2024.
-
Neural Network Tire Force Modeling for Automated Drifting
Authors:
Nicholas Drake Broadbent,
Trey Weber,
Daiki Mori,
J. Christian Gerdes
Abstract:
Automated drifting presents a challenge problem for vehicle control, requiring models and control algorithms that can precisely handle nonlinear, coupled tire forces at the friction limits. We present a neural network architecture for predicting front tire lateral force as a drop-in replacement for physics-based approaches. With a full-scale automated vehicle purpose-built for the drifting applica…
▽ More
Automated drifting presents a challenge problem for vehicle control, requiring models and control algorithms that can precisely handle nonlinear, coupled tire forces at the friction limits. We present a neural network architecture for predicting front tire lateral force as a drop-in replacement for physics-based approaches. With a full-scale automated vehicle purpose-built for the drifting application, we deploy these models in a nonlinear model predictive controller tuned for tracking a reference drifting trajectory, for direct comparisons of model performance. The neural network tire model exhibits significantly improved path tracking performance over the brush tire model in cases where front-axle braking force is applied, suggesting the neural network's ability to express previously unmodeled, latent dynamics in the drifting condition.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
From Computational to Conversational Notebooks
Authors:
Thomas Weber,
Sven Mayer
Abstract:
Today, we see a drastic increase in LLM-based user interfaces to support users in various tasks. Also, in programming, we witness a productivity boost with features like LLM-supported code completion and conversational agents to generate code. In this work, we look at the future of computational notebooks by enriching them with LLM support. We propose a spectrum of support, from simple inline code…
▽ More
Today, we see a drastic increase in LLM-based user interfaces to support users in various tasks. Also, in programming, we witness a productivity boost with features like LLM-supported code completion and conversational agents to generate code. In this work, we look at the future of computational notebooks by enriching them with LLM support. We propose a spectrum of support, from simple inline code completion to executable code that was the output of a conversation. We showcase five concrete examples for potential user interface designs and discuss their benefits and drawbacks. With this, we hope to inspire the future development of LLM-supported computational notebooks.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Linearization Turns Neural Operators into Function-Valued Gaussian Processes
Authors:
Emilia Magnani,
Marvin Pförtner,
Tobias Weber,
Philipp Hennig
Abstract:
Neural operators generalize neural networks to learn mappings between function spaces from data. They are commonly used to learn solution operators of parametric partial differential equations (PDEs) or propagators of time-dependent PDEs. However, to make them useful in high-stakes simulation scenarios, their inherent predictive error must be quantified reliably. We introduce LUNO, a novel framewo…
▽ More
Neural operators generalize neural networks to learn mappings between function spaces from data. They are commonly used to learn solution operators of parametric partial differential equations (PDEs) or propagators of time-dependent PDEs. However, to make them useful in high-stakes simulation scenarios, their inherent predictive error must be quantified reliably. We introduce LUNO, a novel framework for approximate Bayesian uncertainty quantification in trained neural operators. Our approach leverages model linearization to push (Gaussian) weight-space uncertainty forward to the neural operator's predictions. We show that this can be interpreted as a probabilistic version of the concept of currying from functional programming, yielding a function-valued (Gaussian) random process belief. Our framework provides a practical yet theoretically sound way to apply existing Bayesian deep learning methods such as the linearized Laplace approximation to neural operators. Just as the underlying neural operator, our approach is resolution-agnostic by design. The method adds minimal prediction overhead, can be applied post-hoc without retraining the network, and scales to large models and datasets. We evaluate these aspects in a case study on Fourier neural operators.
△ Less
Submitted 31 January, 2025; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Generalizing Orthogonalization for Models with Non-Linearities
Authors:
David Rügamer,
Chris Kolb,
Tobias Weber,
Lucas Kook,
Thomas Nagler
Abstract:
The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic…
▽ More
The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the "orthogonalization" or "normalization" of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method's effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.
△ Less
Submitted 2 June, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition
Authors:
Tobias Weber,
Jakob Dexl,
David Rügamer,
Michael Ingrisch
Abstract:
We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decompositi…
▽ More
We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model's parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.
△ Less
Submitted 18 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Usability and Adoption of Graphical Data-Driven Development Tools
Authors:
Thomas Weber,
Sven Mayer
Abstract:
Software development of modern, data-driven applications still relies on tools that use interaction paradigms that have remained mostly unchanged for decades. While rich forms of interactions exist as an alternative to textual command input, they find little adoption in professional software creation. In this work, we compare graphical programming using direct manipulation to the traditional, text…
▽ More
Software development of modern, data-driven applications still relies on tools that use interaction paradigms that have remained mostly unchanged for decades. While rich forms of interactions exist as an alternative to textual command input, they find little adoption in professional software creation. In this work, we compare graphical programming using direct manipulation to the traditional, textual way of creating data-driven applications to determine the benefits and drawbacks of each. In a between-subjects user study (N=18), we compared developing a machine learning architecture with a graphical editor to traditional code-based development. While qualitative and quantitative measures show general benefits of graphical direct manipulation, the user's subjective perception does not always match this. Participants were aware of the possible benefits of such tools but were still biased in their perception. Our findings highlight that alternative software creation tools cannot just rely on good usability but must emphasize the demands of their specific target group, e.g. user control and flexibility, if they want long-term benefits and adoption.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings
Authors:
Tobias Weber,
Michael Ingrisch,
Bernd Bischl,
David Rügamer
Abstract:
Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trai…
▽ More
Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.
△ Less
Submitted 11 June, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Adversarial Anomaly Detection using Gaussian Priors and Nonlinear Anomaly Scores
Authors:
Fiete Lüer,
Tobias Weber,
Maxim Dolgich,
Christian Böhm
Abstract:
Anomaly detection in imbalanced datasets is a frequent and crucial problem, especially in the medical domain where retrieving and labeling irregularities is often expensive. By combining the generative stability of a $β$-variational autoencoder (VAE) with the discriminative strengths of generative adversarial networks (GANs), we propose a novel model, $β$-VAEGAN. We investigate methods for composi…
▽ More
Anomaly detection in imbalanced datasets is a frequent and crucial problem, especially in the medical domain where retrieving and labeling irregularities is often expensive. By combining the generative stability of a $β$-variational autoencoder (VAE) with the discriminative strengths of generative adversarial networks (GANs), we propose a novel model, $β$-VAEGAN. We investigate methods for composing anomaly scores based on the discriminative and reconstructive capabilities of our model. Existing work focuses on linear combinations of these components to determine if data is anomalous. We advance existing work by training a kernelized support vector machine (SVM) on the respective error components to also consider nonlinear relationships. This improves anomaly detection performance, while allowing faster optimization. Lastly, we use the deviations from the Gaussian prior of $β$-VAEGAN to form a novel anomaly score component. In comparison to state-of-the-art work, we improve the $F_1$ score during anomaly detection from 0.85 to 0.92 on the widely used MITBIH Arrhythmia Database.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
The Emotional Dilemma: Influence of a Human-like Robot on Trust and Cooperation
Authors:
Dennis Becker,
Diana Rueda,
Felix Beese,
Brenda Scarleth Gutierrez Torres,
Myriem Lafdili,
Kyra Ahrens,
Di Fu,
Erik Strahl,
Tom Weber,
Stefan Wermter
Abstract:
Increasing anthropomorphic robot behavioral design could affect trust and cooperation positively. However, studies have shown contradicting results and suggest a task-dependent relationship between robots that display emotions and trust. Therefore, this study analyzes the effect of robots that display human-like emotions on trust, cooperation, and participants' emotions. In the between-group study…
▽ More
Increasing anthropomorphic robot behavioral design could affect trust and cooperation positively. However, studies have shown contradicting results and suggest a task-dependent relationship between robots that display emotions and trust. Therefore, this study analyzes the effect of robots that display human-like emotions on trust, cooperation, and participants' emotions. In the between-group study, participants play the coin entrustment game with an emotional and a non-emotional robot. The results show that the robot that displays emotions induces more anxiety than the neutral robot. Accordingly, the participants trust the emotional robot less and are less likely to cooperate. Furthermore, the perceived intelligence of a robot increases trust, while a desire to outcompete the robot can reduce trust and cooperation. Thus, the design of robots expressing emotions should be task dependent to avoid adverse effects that reduce trust and cooperation.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI Reconstruction
Authors:
Tobias Weber,
Michael Ingrisch,
Bernd Bischl,
David Rügamer
Abstract:
Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space, reducing acquisition times at the cost of decreased image quality. A popular approach is to employ undersampling patterns following various strategies, e.g., variable density sampling or radial trajectories. In this work, we propose a method that directly learns the undersampling…
▽ More
Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space, reducing acquisition times at the cost of decreased image quality. A popular approach is to employ undersampling patterns following various strategies, e.g., variable density sampling or radial trajectories. In this work, we propose a method that directly learns the undersampling masks from data points, thereby also providing task- and domain-specific patterns. To solve the resulting discrete optimization problem, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks, demonstrating the benefits of using custom masks, tailored for a downstream task. For example, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation.
△ Less
Submitted 22 August, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Map-based Experience Replay: A Memory-Efficient Solution to Catastrophic Forgetting in Reinforcement Learning
Authors:
Muhammad Burhan Hafez,
Tilman Immisch,
Tom Weber,
Stefan Wermter
Abstract:
Deep Reinforcement Learning agents often suffer from catastrophic forgetting, forgetting previously found solutions in parts of the input space when training on new data. Replay Memories are a common solution to the problem, decorrelating and shuffling old and new training samples. They naively store state transitions as they come in, without regard for redundancy. We introduce a novel cognitive-i…
▽ More
Deep Reinforcement Learning agents often suffer from catastrophic forgetting, forgetting previously found solutions in parts of the input space when training on new data. Replay Memories are a common solution to the problem, decorrelating and shuffling old and new training samples. They naively store state transitions as they come in, without regard for redundancy. We introduce a novel cognitive-inspired replay memory approach based on the Grow-When-Required (GWR) self-organizing network, which resembles a map-based mental model of the world. Our approach organizes stored transitions into a concise environment-model-like network of state-nodes and transition-edges, merging similar samples to reduce the memory size and increase pair-wise distance among samples, which increases the relevancy of each sample. Overall, our paper shows that map-based experience replay allows for significant memory reduction with only small performance decreases.
△ Less
Submitted 28 August, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
DiscoGen: Learning to Discover Gene Regulatory Networks
Authors:
Nan Rosemary Ke,
Sara-Jane Dunn,
Jorg Bornschein,
Silvia Chiappa,
Melanie Rey,
Jean-Baptiste Lespiau,
Albin Cassirer,
Jane Wang,
Theophane Weber,
David Barrett,
Matthew Botvinick,
Anirudh Goyal,
Mike Mozer,
Danilo Rezende
Abstract:
Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery meth…
▽ More
Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery methods have significantly improved causal discovery, including handling interventional data, improvements in performance and scalability. However, applying state-of-the-art (SOTA) causal discovery methods in biology poses challenges, such as noisy data and a large number of samples. Thus, adapting the causal discovery methods is necessary to handle these challenges. In this paper, we introduce DiscoGen, a neural network-based GRN discovery method that can denoise gene expression measurements and handle interventional data. We demonstrate that our model outperforms SOTA neural network-based causal discovery methods.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Automated wildlife image classification: An active learning tool for ecological applications
Authors:
Ludwig Bothmann,
Lisa Wimmer,
Omid Charrakh,
Tobias Weber,
Hendrik Edelhoff,
Wibke Peters,
Hien Nguyen,
Caryl Benjamin,
Annette Menzel
Abstract:
Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images manually. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates h…
▽ More
Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images manually. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates human expert labor and poses a particular challenge for projects with few cameras or short durations. We propose a label-efficient learning strategy that enables researchers with small or medium-sized image databases to leverage the potential of modern machine learning, thus freeing crucial resources for subsequent analyses.
Our methodological proposal is two-fold: (1) We improve current strategies of combining object detection and image classification by tuning the hyperparameters of both models. (2) We provide an active learning (AL) system that allows training deep learning models very efficiently in terms of required human-labeled training images. We supply a software package that enables researchers to use these methods directly and thereby ensure the broad applicability of the proposed framework in ecological practice.
We show that our tuning strategy improves predictive performance. We demonstrate how the AL pipeline reduces the amount of pre-labeled data needed to achieve a specific predictive performance and that it is especially valuable for improving out-of-sample predictive performance.
We conclude that the combination of tuning and AL increases predictive performance substantially. Furthermore, we argue that our work can broadly impact the community through the ready-to-use software package provided. Finally, the publication of our models tailored to European wildlife data enriches existing model bases mostly trained on data from Africa and North America.
△ Less
Submitted 2 August, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Motion Planning for Triple-Axis Spectrometers
Authors:
Tobias Weber
Abstract:
We present the free and open source software TAS-Paths, a novel system which calculates optimal, collision-free paths for the movement of triple-axis spectrometers. The software features an easy to use graphical user interface, but can also be scripted and used as a library. It allows the user to plan and visualise the motion of the instrument before the experiment and can be used during measureme…
▽ More
We present the free and open source software TAS-Paths, a novel system which calculates optimal, collision-free paths for the movement of triple-axis spectrometers. The software features an easy to use graphical user interface, but can also be scripted and used as a library. It allows the user to plan and visualise the motion of the instrument before the experiment and can be used during measurements to circumvent obstacles. The instrument path is calculated in angular configuration space in order to keep a maximum angular distance from any obstacle.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis
Authors:
Tobias Weber,
Michael Ingrisch,
Bernd Bischl,
David Rügamer
Abstract:
While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality…
▽ More
While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality on a 1-megapixel scale. We further propose MaCheX, which is a unified interface for public chest datasets and forms the largest open collection of chest X-rays up to date. With Cheff conditioned on radiological reports, we further guide the synthesis process over text prompts and unveil the research area of report-to-chest-X-ray generation.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Equivariant MuZero
Authors:
Andreea Deac,
Théophane Weber,
George Papamakarios
Abstract:
Deep reinforcement learning repeatedly succeeds in closed, well-defined domains such as games (Chess, Go, StarCraft). The next frontier is real-world scenarios, where setups are numerous and varied. For this, agents need to learn the underlying rules governing the environment, so as to robustly generalise to conditions that differ from those they were trained on. Model-based reinforcement learning…
▽ More
Deep reinforcement learning repeatedly succeeds in closed, well-defined domains such as games (Chess, Go, StarCraft). The next frontier is real-world scenarios, where setups are numerous and varied. For this, agents need to learn the underlying rules governing the environment, so as to robustly generalise to conditions that differ from those they were trained on. Model-based reinforcement learning algorithms, such as the highly successful MuZero, aim to accomplish this by learning a world model. However, leveraging a world model has not consistently shown greater generalisation capabilities compared to model-free alternatives. In this work, we propose improving the data efficiency and generalisation capabilities of MuZero by explicitly incorporating the symmetries of the environment in its world-model architecture. We prove that, so long as the neural networks used by MuZero are equivariant to a particular symmetry group acting on the environment, the entirety of MuZero's action-selection algorithm will also be equivariant to that group. We evaluate Equivariant MuZero on procedurally-generated MiniPacman and on Chaser from the ProcGen suite: training on a set of mazes, and then testing on unseen rotated versions, demonstrating the benefits of equivariance. Further, we verify that our performance improvements hold even when only some of the components of Equivariant MuZero obey strict equivariance, which highlights the robustness of our construction.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Investigating the role of model-based learning in exploration and transfer
Authors:
Jacob Walker,
Eszter Vértes,
Yazhe Li,
Gabriel Dulac-Arnold,
Ankesh Anand,
Théophane Weber,
Jessica B. Hamrick
Abstract:
State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribu…
▽ More
State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribution. Agents that can effectively transfer their knowledge about the world pose a potential solution to these issues. In this paper, we investigate transfer learning in the context of model-based agents. Specifically, we aim to understand when exactly environment models have an advantage and why. We find that a model-based approach outperforms controlled model-free baselines for transfer learning. Through ablations, we show that both the policy and dynamics model learnt through exploration matter for successful transfer. We demonstrate our results across three domains which vary in their requirements for transfer: in-distribution procedural (Crafter), in-distribution identical (RoboDesk), and out-of-distribution (Meta-World). Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Laser: Latent Set Representations for 3D Generative Modeling
Authors:
Pol Moreno,
Adam R. Kosiorek,
Heiko Strathmann,
Daniel Zoran,
Rosalia G. Schneider,
Björn Winckler,
Larisa Markeeva,
Théophane Weber,
Danilo J. Rezende
Abstract:
NeRF provides unparalleled fidelity of novel view synthesis: rendering a 3D scene from an arbitrary viewpoint. NeRF requires training on a large number of views that fully cover a scene, which limits its applicability. While these issues can be addressed by learning a prior over scenes in various forms, previous approaches have been either applied to overly simple scenes or struggling to render un…
▽ More
NeRF provides unparalleled fidelity of novel view synthesis: rendering a 3D scene from an arbitrary viewpoint. NeRF requires training on a large number of views that fully cover a scene, which limits its applicability. While these issues can be addressed by learning a prior over scenes in various forms, previous approaches have been either applied to overly simple scenes or struggling to render unobserved parts. We introduce Laser-NV: a generative model which achieves high modelling capacity, and which is based on a set-valued latent representation modelled by normalizing flows. Similarly to previous amortized approaches, Laser-NV learns structure from multiple scenes and is capable of fast, feed-forward inference from few views. To encourage higher rendering fidelity and consistency with observed views, Laser-NV further incorporates a geometry-informed attention mechanism over the observed views. Laser-NV further produces diverse and plausible completions of occluded parts of a scene while remaining consistent with observations. Laser-NV shows state-of-the-art novel-view synthesis quality when evaluated on ShapeNet and on a novel simulated City dataset, which features high uncertainty in the unobserved regions of the scene.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports
Authors:
Katharina Jeblick,
Balthasar Schachtner,
Jakob Dexl,
Andreas Mittermeier,
Anna Theresa Stüber,
Johanna Topalis,
Tobias Weber,
Philipp Wesp,
Bastian Sabel,
Jens Ricke,
Michael Ingrisch
Abstract:
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conduct…
▽ More
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
Large-Scale Retrieval for Reinforcement Learning
Authors:
Peter C. Humphreys,
Arthur Guez,
Olivier Tieleman,
Laurent Sifre,
Théophane Weber,
Timothy Lillicrap
Abstract:
Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making into its network weights via gradient descent on training losses. Here, we pursue an alternative approach in which agents can utilise large-scale…
▽ More
Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making into its network weights via gradient descent on training losses. Here, we pursue an alternative approach in which agents can utilise large-scale context sensitive database lookups to support their parametric computations. This allows agents to directly learn in an end-to-end manner to utilise relevant information to inform their outputs. In addition, new information can be attended to by the agent, without retraining, by simply augmenting the retrieval dataset. We study this approach for offline RL in 9x9 Go, a challenging game for which the vast combinatorial state space privileges generalisation over direct matching to past experiences. We leverage fast, approximate nearest neighbor techniques in order to retrieve relevant data from a set of tens of millions of expert demonstration states. Attending to this information provides a significant boost to prediction accuracy and game-play performance over simply using these demonstrations as training trajectories, providing a compelling demonstration of the value of large-scale retrieval in offline RL agents.
△ Less
Submitted 16 December, 2022; v1 submitted 10 June, 2022;
originally announced June 2022.
-
GASP: Gated Attention For Saliency Prediction
Authors:
Fares Abawi,
Tom Weber,
Stefan Wermter
Abstract:
Saliency prediction refers to the computational task of modeling overt attention. Social cues greatly influence our attention, consequently altering our eye movements and behavior. To emphasize the efficacy of such features, we present a neural model for integrating social cues and weighting their influences. Our model consists of two stages. During the first stage, we detect two social cues by fo…
▽ More
Saliency prediction refers to the computational task of modeling overt attention. Social cues greatly influence our attention, consequently altering our eye movements and behavior. To emphasize the efficacy of such features, we present a neural model for integrating social cues and weighting their influences. Our model consists of two stages. During the first stage, we detect two social cues by following gaze, estimating gaze direction, and recognizing affect. These features are then transformed into spatiotemporal maps through image processing operations. The transformed representations are propagated to the second stage (GASP) where we explore various techniques of late fusion for integrating social cues and introduce two sub-networks for directing attention to relevant stimuli. Our experiments indicate that fusion approaches achieve better results for static integration methods, whereas non-fusion approaches for which the influence of each modality is unknown, result in better outcomes when coupled with recurrent models for dynamic saliency prediction. We show that gaze direction and affective representations contribute a prediction to ground-truth correspondence improvement of at least 5% compared to dynamic saliency models without social cues. Furthermore, affective representations improve GASP, supporting the necessity of considering affect-biased attention in predicting saliency.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Learning to Induce Causal Structure
Authors:
Nan Rosemary Ke,
Silvia Chiappa,
Jane Wang,
Anirudh Goyal,
Jorg Bornschein,
Melanie Rey,
Theophane Weber,
Matthew Botvinic,
Michael Mozer,
Danilo Jimenez Rezende
Abstract:
The fundamental challenge in causal induction is to infer the underlying graph structure given observational and/or interventional data. Most existing causal induction algorithms operate by generating candidate graphs and evaluating them using either score-based methods (including continuous optimization) or independence tests. In our work, we instead treat the inference process as a black box and…
▽ More
The fundamental challenge in causal induction is to infer the underlying graph structure given observational and/or interventional data. Most existing causal induction algorithms operate by generating candidate graphs and evaluating them using either score-based methods (including continuous optimization) or independence tests. In our work, we instead treat the inference process as a black box and design a neural network architecture that learns the mapping from both observational and interventional data to graph structures via supervised training on synthetic graphs. The learned model generalizes to new synthetic graphs, is robust to train-test distribution shifts, and achieves state-of-the-art performance on naturalistic graphs for low sample complexity.
△ Less
Submitted 7 October, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Explain yourself! Effects of Explanations in Human-Robot Interaction
Authors:
Jakob Ambsdorf,
Alina Munir,
Yiyao Wei,
Klaas Degkwitz,
Harm Matthias Harms,
Susanne Stannek,
Kyra Ahrens,
Dennis Becker,
Erik Strahl,
Tom Weber,
Stefan Wermter
Abstract:
Recent developments in explainable artificial intelligence promise the potential to transform human-robot interaction: Explanations of robot decisions could affect user perceptions, justify their reliability, and increase trust. However, the effects on human perceptions of robots that explain their decisions have not been studied thoroughly. To analyze the effect of explainable robots, we conduct…
▽ More
Recent developments in explainable artificial intelligence promise the potential to transform human-robot interaction: Explanations of robot decisions could affect user perceptions, justify their reliability, and increase trust. However, the effects on human perceptions of robots that explain their decisions have not been studied thoroughly. To analyze the effect of explainable robots, we conduct a study in which two simulated robots play a competitive board game. While one robot explains its moves, the other robot only announces them. Providing explanations for its actions was not sufficient to change the perceived competence, intelligence, likeability or safety ratings of the robot. However, the results show that the robot that explains its moves is perceived as more lively and human-like. This study demonstrates the need for and potential of explainable human-robot interaction and the wider assessment of its effects as a novel research direction.
△ Less
Submitted 14 June, 2022; v1 submitted 9 April, 2022;
originally announced April 2022.
-
Retrieval-Augmented Reinforcement Learning
Authors:
Anirudh Goyal,
Abram L. Friesen,
Andrea Banino,
Theophane Weber,
Nan Rosemary Ke,
Adria Puigdomenech Badia,
Arthur Guez,
Mehdi Mirza,
Peter C. Humphreys,
Ksenia Konyushkova,
Laurent Sifre,
Michal Valko,
Simon Osindero,
Timothy Lillicrap,
Nicolas Heess,
Charles Blundell
Abstract:
Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the…
▽ More
Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. he proposed method facilitates learning agents that at test-time can condition their behavior on the entire dataset and not only the current state, or current trajectory. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.
△ Less
Submitted 24 May, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Ethically aligned Deep Learning: Unbiased Facial Aesthetic Prediction
Authors:
Michael Danner,
Thomas Weber,
Leping Peng,
Tobias Gerlach,
Xueping Su,
Matthias Rätsch
Abstract:
Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. In the past those results were highly correlated with human ratings, therefore also with their bias in annotating. As artificial intelligence can have racist and discriminatory tendencies, the cause of skews in the data must be identified. Development of training data and AI algorith…
▽ More
Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. In the past those results were highly correlated with human ratings, therefore also with their bias in annotating. As artificial intelligence can have racist and discriminatory tendencies, the cause of skews in the data must be identified. Development of training data and AI algorithms that are robust against biased information is a new challenge for scientists. As aesthetic judgement usually is biased, we want to take it one step further and propose an Unbiased Convolutional Neural Network for FBP. While it is possible to create network models that can rate attractiveness of faces on a high level, from an ethical point of view, it is equally important to make sure the model is unbiased. In this work, we introduce AestheticNet, a state-of-the-art attractiveness prediction network, which significantly outperforms competitors with a Pearson Correlation of 0.9601. Additionally, we propose a new approach for generating a bias-free CNN to improve fairness in machine learning.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Procedural Generalization by Planning with Self-Supervised World Models
Authors:
Ankesh Anand,
Jacob Walker,
Yazhe Li,
Eszter Vértes,
Julian Schrittwieser,
Sherjil Ozair,
Théophane Weber,
Jessica B. Hamrick
Abstract:
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ab…
▽ More
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis on MuZero (Schrittwieser et al., 2020), a powerful model-based agent, and evaluate its performance on both procedural and task generalization. We identify three factors of procedural generalization -- planning, self-supervised representation learning, and procedural data diversity -- and show that by combining these techniques, we achieve state-of-the art generalization performance and data efficiency on Procgen (Cobbe et al., 2019). However, we find that these factors do not always provide the same benefits for the task generalization benchmarks in Meta-World (Yu et al., 2019), indicating that transfer remains a challenge and may require different approaches than procedural generalization. Overall, we suggest that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation
Authors:
Tobias Weber,
Michael Ingrisch,
Bernd Bischl,
David Rügamer
Abstract:
The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based…
▽ More
The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.
△ Less
Submitted 17 November, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Survival-oriented embeddings for improving accessibility to complex data structures
Authors:
Tobias Weber,
Michael Ingrisch,
Matthias Fabritius,
Bernd Bischl,
David Rügamer
Abstract:
Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and cons…
▽ More
Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.
△ Less
Submitted 3 November, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
The future of human-AI collaboration: a taxonomy of design knowledge for hybrid intelligence systems
Authors:
Dominik Dellermann,
Adrian Calma,
Nikolaus Lipusch,
Thorsten Weber,
Sascha Weigel,
Philipp Ebel
Abstract:
Recent technological advances, especially in the field of machine learning, provide astonishing progress on the road towards artificial general intelligence. However, tasks in current real-world business applications cannot yet be solved by machines alone. We, therefore, identify the need for developing socio-technological ensembles of humans and machines. Such systems possess the ability to accom…
▽ More
Recent technological advances, especially in the field of machine learning, provide astonishing progress on the road towards artificial general intelligence. However, tasks in current real-world business applications cannot yet be solved by machines alone. We, therefore, identify the need for developing socio-technological ensembles of humans and machines. Such systems possess the ability to accomplish complex goals by combining human and artificial intelligence to collectively achieve superior results and continuously improve by learning from each other. Thus, the need for structured design knowledge for those systems arises. Following a taxonomy development method, this article provides three main contributions: First, we present a structured overview of interdisciplinary research on the role of humans in the machine learning pipeline. Second, we envision hybrid intelligence systems and conceptualize the relevant dimensions for system design for the first time. Finally, we offer useful guidance for system developers during the implementation of such applications.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
Muesli: Combining Improvements in Policy Optimization
Authors:
Matteo Hessel,
Ivo Danihelka,
Fabio Viola,
Arthur Guez,
Simon Schmitt,
Laurent Sifre,
Theophane Weber,
David Silver,
Hado van Hasselt
Abstract:
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by ex…
▽ More
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
△ Less
Submitted 31 March, 2022; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Synthetic Returns for Long-Term Credit Assignment
Authors:
David Raposo,
Sam Ritter,
Adam Santoro,
Greg Wayne,
Theophane Weber,
Matt Botvinick,
Hado van Hasselt,
Francis Song
Abstract:
Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) le…
▽ More
Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two. In this work, we use SA-learning to model the contribution of past states to the current reward. With this model we can predict each state's contribution to the far future, a quantity we call "synthetic returns". TD-learning can then be applied to select actions that maximize these synthetic returns (SRs). We demonstrate the effectiveness of augmenting agents with SRs across a range of tasks on which TD-learning alone fails. We show that the learned SRs are interpretable: they spike for states that occur after critical actions are taken. Finally, we show that our IMPALA-based SR agent solves Atari Skiing -- a game with a lengthy reward delay that posed a major hurdle to deep-RL agents -- 25 times faster than the published state-of-the-art.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Hierarchical Learning Using Deep Optimum-Path Forest
Authors:
Luis C. S. Afonso,
Clayton R. Pereira,
Silke A. T. Weber,
Christian Hook,
Alexandre X. Falcão,
João P. Papa
Abstract:
Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses. In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW. The proposed approach concerns a hierarchical-based learning technique to design visual diction…
▽ More
Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses. In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW. The proposed approach concerns a hierarchical-based learning technique to design visual dictionaries through the Deep Optimum-Path Forest classifier. The proposed method was evaluated in six datasets derived from data collected from individuals when performing handwriting exams. Experimental results showed the potential of the technique, with robust achievements.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Neural Recursive Belief States in Multi-Agent Reinforcement Learning
Authors:
Pol Moreno,
Edward Hughes,
Kevin R. McKee,
Bernardo Avila Pires,
Théophane Weber
Abstract:
In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned on information only observed by them. On the other hand, humans readily form beliefs about the knowledge possessed by their peers and leverage beliefs to inform decision-making. Such abilities underlie individual success in a wide range of Ma…
▽ More
In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned on information only observed by them. On the other hand, humans readily form beliefs about the knowledge possessed by their peers and leverage beliefs to inform decision-making. Such abilities underlie individual success in a wide range of Markov games, from bluffing in Poker to conditional cooperation in the Prisoner's Dilemma, to convention-building in Bridge. Classical methods are usually not applicable to complex domains due to the intractable nature of hierarchical beliefs (i.e. beliefs of other agents' beliefs). We propose a scalable method to approximate these belief structures using recursive deep generative models, and to use the belief models to obtain representations useful to acting in complex tasks. Our agents trained with belief models outperform model-free baselines with equivalent representational capacity using common training paradigms. We also show that higher-order belief models outperform agents with lower-order models.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
Authors:
Brian Lee,
Brandi Dupervil,
Nicholas P. Deputy,
Wil Duck,
Stephen Soroka,
Lyndsay Bottichio,
Benjamin Silk,
Jason Price,
Patricia Sweeney,
Jennifer Fuld,
Todd Weber,
Dan Pollock
Abstract:
Objectives: Federal open data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial (STLT) partners. These initiatives advance understanding of health conditions and diseases by providing data to more researchers, scientists, and policymakers for analysi…
▽ More
Objectives: Federal open data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial (STLT) partners. These initiatives advance understanding of health conditions and diseases by providing data to more researchers, scientists, and policymakers for analysis, collaboration, and valuable use outside CDC responders. This is particularly true for emerging conditions such as COVID-19 where we have much to learn and have evolving data needs. Since the beginning of the outbreak, CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records, increasing each day. This paper describes how CDC designed and produces two de-identified public datasets from these collected data.
Materials and Methods: Data elements were included based on the usefulness, public request, and privacy implications; specific field values were suppressed to reduce risk of reidentification and exposure of confidential information. Datasets were created and verified for privacy and confidentiality using data management platform analytic tools as well as R scripts.
Results: Unrestricted data are available to the public through Data.CDC.gov and restricted data, with additional fields, are available with a data use agreement through a private repository on GitHub.com.
Practice Implications: Enriched understanding of the available public data, the methods used to create these data, and the algorithms used to protect privacy of de-identified individuals allow for improved data use. Automating data generation procedures allows greater and more timely sharing of data.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Mechanisation of Model-theoretic Conservative Extension for HOL with Ad-hoc Overloading
Authors:
Arve Gengelbach,
Johannes Åman Pohjola,
Tjark Weber
Abstract:
Definitions of new symbols merely abbreviate expressions in logical frameworks, and no new facts (regarding previously defined symbols) should hold because of a new definition. In Isabelle/HOL, definable symbols are types and constants. The latter may be ad-hoc overloaded, i.e. have different definitions for non-overlapping types. We prove that symbols that are independent of a new definition may…
▽ More
Definitions of new symbols merely abbreviate expressions in logical frameworks, and no new facts (regarding previously defined symbols) should hold because of a new definition. In Isabelle/HOL, definable symbols are types and constants. The latter may be ad-hoc overloaded, i.e. have different definitions for non-overlapping types. We prove that symbols that are independent of a new definition may keep their interpretation in a model extension. This work revises our earlier notion of model-theoretic conservative extension and generalises an earlier model construction. We obtain consistency of theories of definitions in higher-order logic (HOL) with ad-hoc overloading as a corollary. Our results are mechanised in the HOL4 theorem prover.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
A case for new neural network smoothness constraints
Authors:
Mihaela Rosca,
Theophane Weber,
Arthur Gretton,
Shakir Mohamed
Abstract:
How sensitive should machine learning models be to input changes? We tackle the question of model smoothness and show that it is a useful inductive bias which aids generalization, adversarial robustness, generative modeling and reinforcement learning. We explore current methods of imposing smoothness constraints and observe they lack the flexibility to adapt to new tasks, they don't account for da…
▽ More
How sensitive should machine learning models be to input changes? We tackle the question of model smoothness and show that it is a useful inductive bias which aids generalization, adversarial robustness, generative modeling and reinforcement learning. We explore current methods of imposing smoothness constraints and observe they lack the flexibility to adapt to new tasks, they don't account for data modalities, they interact with losses, architectures and optimization in ways not yet fully understood. We conclude that new advances in the field are hinging on finding ways to incorporate data, tasks and learning into our definitions of smoothness.
△ Less
Submitted 7 July, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning
Authors:
Angelo Ziletti,
Christoph Berns,
Oliver Treichel,
Thomas Weber,
Jennifer Liang,
Stephanie Kammerath,
Marion Schwaerzler,
Jagatheswari Virayah,
David Ruau,
Xin Ma,
Andreas Mattern
Abstract:
Millions of unsolicited medical inquiries are received by pharmaceutical companies every year. It has been hypothesized that these inquiries represent a treasure trove of information, potentially giving insight into matters regarding medicinal products and the associated medical treatments. However, due to the large volume and specialized nature of the inquiries, it is difficult to perform timely,…
▽ More
Millions of unsolicited medical inquiries are received by pharmaceutical companies every year. It has been hypothesized that these inquiries represent a treasure trove of information, potentially giving insight into matters regarding medicinal products and the associated medical treatments. However, due to the large volume and specialized nature of the inquiries, it is difficult to perform timely, recurrent, and comprehensive analyses. Here, we propose a machine learning approach based on natural language processing and unsupervised learning to automatically discover key topics in real-world medical inquiries from customers. This approach does not require ontologies nor annotations. The discovered topics are meaningful and medically relevant, as judged by medical information specialists, thus demonstrating that unsolicited medical inquiries are a source of valuable customer insights. Our work paves the way for the machine-learning-driven analysis of medical inquiries in the pharmaceutical industry, which ultimately aims at improving patient care.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.