-
Adjustment for Confounding using Pre-Trained Representations
Authors:
Rickmer Schulte,
David Rügamer,
Thomas Nagler
Abstract:
There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning…
▽ More
There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning. In this work, we investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding. We formalize conditions under which these latent features enable valid adjustment and statistical inference in ATE estimation, demonstrating results along the example of double machine learning. We discuss critical challenges inherent to latent feature learning and downstream parameter estimation arising from the high dimensionality and non-identifiability of representations. Common structural assumptions for obtaining fast convergence rates with additive or sparse linear models are shown to be unrealistic for latent features. We argue, however, that neural networks are largely insensitive to these issues. In particular, we show that neural networks can achieve fast convergence rates by adapting to intrinsic notions of sparsity and dimension of the learning problem.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent
Authors:
Tobias Pielok,
Bernd Bischl,
David Rügamer
Abstract:
Semi-implicit variational inference (SIVI) is a powerful framework for approximating complex posterior distributions, but training with the Kullback-Leibler (KL) divergence can be challenging due to high variance and bias in high-dimensional settings. While current state-of-the-art semi-implicit variational inference methods, particularly Kernel Semi-Implicit Variational Inference (KSIVI), have be…
▽ More
Semi-implicit variational inference (SIVI) is a powerful framework for approximating complex posterior distributions, but training with the Kullback-Leibler (KL) divergence can be challenging due to high variance and bias in high-dimensional settings. While current state-of-the-art semi-implicit variational inference methods, particularly Kernel Semi-Implicit Variational Inference (KSIVI), have been shown to work in high dimensions, training remains moderately expensive. In this work, we propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing. To further reduce the bias, we introduce an importance sampling correction. We provide a theoretical connection to the amortized version of the Stein variational gradient descent, which estimates the score gradient via Stein's identity, showing that both methods minimize the same objective, but our semi-implicit approach achieves lower gradient variance. In addition, our method's bias in function space is benign, leading to more stable and efficient optimization. Empirical results demonstrate that our method outperforms or matches state-of-the-art SIVI methods in both performance and training efficiency.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Revisiting Unbiased Implicit Variational Inference
Authors:
Tobias Pielok,
Bernd Bischl,
David Rügamer
Abstract:
Recent years have witnessed growing interest in semi-implicit variational inference (SIVI) methods due to their ability to rapidly generate samples from complex distributions. However, since the likelihood of these samples is non-trivial to estimate in high dimensions, current research focuses on finding effective SIVI training routines. Although unbiased implicit variational inference (UIVI) has…
▽ More
Recent years have witnessed growing interest in semi-implicit variational inference (SIVI) methods due to their ability to rapidly generate samples from complex distributions. However, since the likelihood of these samples is non-trivial to estimate in high dimensions, current research focuses on finding effective SIVI training routines. Although unbiased implicit variational inference (UIVI) has largely been dismissed as imprecise and computationally prohibitive because of its inner MCMC loop, we revisit this method and show that UIVI's MCMC loop can be effectively replaced via importance sampling and the optimal proposal distribution can be learned stably by minimizing an expected forward Kullback-Leibler divergence without bias. Our refined approach demonstrates superior performance or parity with state-of-the-art methods on established SIVI benchmarks.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals
Authors:
Marcel Arpogaus,
Thomas Kneib,
Thomas Nagler,
David Rügamer
Abstract:
Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple dimensions, interpreting the input-output relationship of such models is often difficult, due to the black-box character of deep learning models. In contrast, exist…
▽ More
Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple dimensions, interpreting the input-output relationship of such models is often difficult, due to the black-box character of deep learning models. In contrast, existing statistical methods for multivariate outcomes such as multivariate conditional transformation models (MCTM) are restricted in flexibility and are often not expressive enough to represent complex multivariate probability distributions. In this paper, we combine MCTM with state-of-the-art and autoregressive NF to leverage the transparency of MCTM for modeling interpretable feature effects on the marginal distributions in the first step and the flexibility of neural-network-based NF techniques to account for complex and non-linear relationships in the joint data distribution. We demonstrate our method's versatility in various numerical experiments and compare it with MCTM and other NF models on both simulated and real-world data.
△ Less
Submitted 12 June, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors
Authors:
Thomas Nagler,
David Rügamer
Abstract:
Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient samplin…
▽ More
Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient sampling procedure to construct Bayesian posteriors for such estimates based on Martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the uncertainty quantification of our method in inference applications.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
How To Make Your Cell Tracker Say "I dunno!"
Authors:
Richard D. Paul,
Johannes Seiffarth,
David Rügamer,
Hanno Scharr,
Katharina Nöh
Abstract:
Cell tracking is a key computational task in live-cell microscopy, but fully automated analysis of high-throughput imaging requires reliable and, thus, uncertainty-aware data analysis tools, as the amount of data recorded within a single experiment exceeds what humans are able to overlook. We here propose and benchmark various methods to reason about and quantify uncertainty in linear assignment-b…
▽ More
Cell tracking is a key computational task in live-cell microscopy, but fully automated analysis of high-throughput imaging requires reliable and, thus, uncertainty-aware data analysis tools, as the amount of data recorded within a single experiment exceeds what humans are able to overlook. We here propose and benchmark various methods to reason about and quantify uncertainty in linear assignment-based cell tracking algorithms. Our methods take inspiration from statistics and machine learning, leveraging two perspectives on the cell tracking problem explored throughout this work: Considering it as a Bayesian inference problem and as a classification problem. Our methods admit a framework-like character in that they equip any frame-to-frame tracking method with uncertainty quantification. We demonstrate this by applying it to various existing tracking algorithms including the recently presented Transformer-based trackers. We demonstrate empirically that our methods yield useful and well-calibrated tracking uncertainties.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Additive Model Boosting: New Insights and Path(ologie)s
Authors:
Rickmer Schulte,
David Rügamer
Abstract:
Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects…
▽ More
Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects are still poorly understood, including general convergence behavior and what optimization problem is being solved when accounting for the implicit regularizing nature of boosting. In this work, we study the solution paths of BAMs and establish connections with other approaches for certain classes of problems. Along these lines, we derive novel convergence results for BAMs, which yield crucial insights into the inner workings of the method. While our results generally provide reassuring theoretical evidence for the practical use of BAMs, they also uncover some ``pathologies'' of boosting for certain additive model classes concerning their convergence behavior that require caution in practice. We empirically validate our theoretical findings through several numerical experiments.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Paths and Ambient Spaces in Neural Loss Landscapes
Authors:
Daniel Dold,
Julius Kobialka,
Nicolai Palm,
Emanuel Sommer,
David Rügamer,
Oliver Dürr
Abstract:
Understanding the structure of neural network loss surfaces, particularly the emergence of low-loss tunnels, is critical for advancing neural network theory and practice. In this paper, we propose a novel approach to directly embed loss tunnels into the loss landscape of neural networks. Exploring the properties of these loss tunnels offers new insights into their length and structure and sheds li…
▽ More
Understanding the structure of neural network loss surfaces, particularly the emergence of low-loss tunnels, is critical for advancing neural network theory and practice. In this paper, we propose a novel approach to directly embed loss tunnels into the loss landscape of neural networks. Exploring the properties of these loss tunnels offers new insights into their length and structure and sheds light on some common misconceptions. We then apply our approach to Bayesian neural networks, where we improve subspace inference by identifying pitfalls and proposing a more natural prior that better guides the sampling procedure.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Authors:
Chris Kolb,
Tobias Weber,
Bernd Bischl,
David Rügamer
Abstract:
Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the $L_1$ norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of…
▽ More
Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the $L_1$ norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of $L_1$-penalized neural networks by adding differentiable $L_2$ regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify important learning rate requirements necessary for training factorized networks. We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.
△ Less
Submitted 7 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
A Functional Extension of Semi-Structured Networks
Authors:
David Rügamer,
Bernard X. W. Liew,
Zainab Altai,
Almond Stöcker
Abstract:
Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this…
▽ More
Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this paper explores extending SSNs to functional data. Existing methods in functional data analysis are promising but often not expressive enough to account for all interactions and non-linearities and do not scale well to large datasets. Although the SSN approach presents a compelling potential solution, its adaptation to functional data remains complex. In this work, we propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability. Our numerical experiments demonstrate that this approach accurately recovers underlying signals, enhances predictive performance, and performs favorably compared to competing methods.
△ Less
Submitted 13 October, 2024; v1 submitted 7 October, 2024;
originally announced October 2024.
-
Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects
Authors:
David Köhler,
David Rügamer,
Matthias Schmid
Abstract:
Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount…
▽ More
Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount to ensure user acceptance and fairness. The need for interpretable ML models has boosted research in the field of interpretable machine learning (IML). Here we propose a novel approach for the functional decomposition of black-box predictions, which is considered a core concept of IML. The idea of our method is to replace the prediction function by a surrogate model consisting of simpler subfunctions. Similar to additive regression models, these functions provide insights into the direction and strength of the main feature contributions and their interactions. Our method is based on a novel concept termed stacked orthogonality, which ensures that the main effects capture as much functional behavior as possible and do not contain information explained by higher-order interactions. Unlike earlier functional IML approaches, it is neither affected by extrapolation nor by hidden feature interactions. To compute the subfunctions, we propose an algorithm based on neural additive modeling and an efficient post-hoc orthogonalization procedure.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression
Authors:
Lucas Kook,
Chris Kolb,
Philipp Schiele,
Daniel Dold,
Marcel Arpogaus,
Cornelius Fritz,
Philipp F. Baumann,
Philipp Kopper,
Tobias Pielok,
Emilio Dorigatti,
David Rügamer
Abstract:
Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse…
▽ More
Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.
△ Less
Submitted 10 July, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
Generalizing Orthogonalization for Models with Non-Linearities
Authors:
David Rügamer,
Chris Kolb,
Tobias Weber,
Lucas Kook,
Thomas Nagler
Abstract:
The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic…
▽ More
The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the "orthogonalization" or "normalization" of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method's effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.
△ Less
Submitted 2 June, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Position: Why We Must Rethink Empirical Research in Machine Learning
Authors:
Moritz Herrmann,
F. Julian D. Lange,
Katharina Eggensperger,
Giuseppe Casalicchio,
Marcel Wever,
Matthias Feurer,
David Rügamer,
Eyke Hüllermeier,
Anne-Laure Boulesteix,
Bernd Bischl
Abstract:
We warn against a common but incomplete understanding of empirical research in machine learning that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue…
▽ More
We warn against a common but incomplete understanding of empirical research in machine learning that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical machine learning research is fashioned as confirmatory research while it should rather be considered exploratory.
△ Less
Submitted 25 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
On Training Survival Models with Scoring Rules
Authors:
Philipp Kopper,
David Rügamer,
Raphael Sonabend,
Bernd Bischl,
Andreas Bender
Abstract:
Scoring rules are an established way of comparing predictive performances across model classes. In the context of survival analysis, they require adaptation in order to accommodate censoring. This work investigates using scoring rules for model training rather than evaluation. Doing so, we establish a general framework for training survival models that is model agnostic and can learn event time di…
▽ More
Scoring rules are an established way of comparing predictive performances across model classes. In the context of survival analysis, they require adaptation in order to accommodate censoring. This work investigates using scoring rules for model training rather than evaluation. Doing so, we establish a general framework for training survival models that is model agnostic and can learn event time distributions parametrically or non-parametrically. In addition, our framework is not restricted to any specific scoring rule. While we focus on neural network-based implementations, we also provide proof-of-concept implementations using gradient boosting, generalized additive models, and trees. Empirical comparisons on synthetic and real-world data indicate that scoring rules can be successfully incorporated into model training and yield competitive predictive performance with established time-to-event models.
△ Less
Submitted 13 November, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Interpretable Machine Learning for TabPFN
Authors:
David Rundel,
Julius Kobialka,
Constantin von Crailsheim,
Matthias Feurer,
Thomas Nagler,
David Rügamer
Abstract:
The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning paramet…
▽ More
The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning parameters or hyperparameter tuning. This makes TabPFN a very attractive option for a wide range of domain applications. However, a major drawback of the method is its lack of interpretability. Therefore, we propose several adaptations of popular interpretability methods that we specifically design for TabPFN. By taking advantage of the unique properties of the model, our adaptations allow for more efficient computations than existing implementations. In particular, we show how in-context learning facilitates the estimation of Shapley values by avoiding approximate retraining and enables the use of Leave-One-Covariate-Out (LOCO) even when working with large-scale Transformers. In addition, we demonstrate how data valuation methods can be used to address scalability challenges of TabPFN. Our proposed methods are implemented in a package tabpfn_iml and made available at https://github.com/david-rundel/tabpfn_iml.
△ Less
Submitted 23 July, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?
Authors:
Emanuel Sommer,
Lisa Wimmer,
Theodore Papamarkou,
Ludwig Bothmann,
Bernd Bischl,
David Rügamer
Abstract:
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, w…
▽ More
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a deep ensemble initialized approach as an effective solution with competitive performance and uncertainty quantification.
△ Less
Submitted 27 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Scalable Higher-Order Tensor Product Spline Models
Authors:
David Rügamer
Abstract:
In the current era of vast data and transparent machine learning, it is essential for techniques to operate at a large scale while providing a clear mathematical comprehension of the internal workings of the method. Although there already exist interpretable semi-parametric regression methods for large-scale applications that take into account non-linearity in the data, the complexity of the model…
▽ More
In the current era of vast data and transparent machine learning, it is essential for techniques to operate at a large scale while providing a clear mathematical comprehension of the internal workings of the method. Although there already exist interpretable semi-parametric regression methods for large-scale applications that take into account non-linearity in the data, the complexity of the models is still often limited. One of the main challenges is the absence of interactions in these models, which are left out for the sake of better interpretability but also due to impractical computational costs. To overcome this limitation, we propose a new approach using a factorization method to derive a highly scalable higher-order tensor product spline model. Our method allows for the incorporation of all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We further develop a meaningful penalization scheme and examine the induced optimization problem. We conclude by evaluating the predictive and estimation performance of our method.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
Authors:
Theodore Papamarkou,
Maria Skoularidou,
Konstantina Palla,
Laurence Aitchison,
Julyan Arbel,
David Dunson,
Maurizio Filippone,
Vincent Fortuin,
Philipp Hennig,
José Miguel Hernández-Lobato,
Aliaksandr Hubin,
Alexander Immer,
Theofanis Karaletsos,
Mohammad Emtiyaz Khan,
Agustinus Kristiadi,
Yingzhen Li,
Stephan Mandt,
Christopher Nemeth,
Michael A. Osborne,
Tim G. J. Rudner,
David Rügamer,
Yee Whye Teh,
Max Welling,
Andrew Gordon Wilson,
Ruqi Zhang
Abstract:
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni…
▽ More
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.
△ Less
Submitted 6 August, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Bayesian Semi-structured Subspace Inference
Authors:
Daniel Dold,
David Rügamer,
Beate Sick,
Oliver Dürr
Abstract:
Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. The structured model part is inspired by statistical models and can be used to infer the input-output relationship for features of particular importance. The complex unstructured part defines an arbitrary deep neural network and thereby provides enough flexibility to ac…
▽ More
Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. The structured model part is inspired by statistical models and can be used to infer the input-output relationship for features of particular importance. The complex unstructured part defines an arbitrary deep neural network and thereby provides enough flexibility to achieve competitive prediction performance. While these models can also account for aleatoric uncertainty, there is still a lack of work on accounting for epistemic uncertainty. In this paper, we address this problem by presenting a Bayesian approximation for semi-structured regression models using subspace inference. To this end, we extend subspace inference for joint posterior sampling from a full parameter space for structured effects and a subspace for unstructured effects. Apart from this hybrid sampling scheme, our method allows for tunable complexity of the subspace and can capture multiple minima in the loss landscape. Numerical experiments validate our approach's efficacy in recovering structured effect parameter posteriors in semi-structured models and approaching the full-space posterior distribution of MCMC for increasing subspace dimension. Further, our approach exhibits competitive predictive performance across simulated and real-world datasets.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Functional Data Analysis: An Introduction and Recent Developments
Authors:
Jan Gertheiss,
David Rügamer,
Bernard X. W. Liew,
Sonja Greven
Abstract:
Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionalit…
▽ More
Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry, and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a data set on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available on Github: https://github.com/davidruegamer/FDA_tutorial .
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings
Authors:
Tobias Weber,
Michael Ingrisch,
Bernd Bischl,
David Rügamer
Abstract:
Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trai…
▽ More
Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.
△ Less
Submitted 11 June, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization
Authors:
Chris Kolb,
Christian L. Müller,
Bernd Bischl,
David Rügamer
Abstract:
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep l…
▽ More
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.
△ Less
Submitted 26 April, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
A New PHO-rmula for Improved Performance of Semi-Structured Networks
Authors:
David Rügamer
Abstract:
Recent advances to combine structured regression models and deep neural networks for better interpretability, more expressiveness, and statistically valid uncertainty quantification demonstrate the versatility of semi-structured neural networks (SSNs). We show that techniques to properly identify the contributions of the different model components in SSNs, however, lead to suboptimal network estim…
▽ More
Recent advances to combine structured regression models and deep neural networks for better interpretability, more expressiveness, and statistically valid uncertainty quantification demonstrate the versatility of semi-structured neural networks (SSNs). We show that techniques to properly identify the contributions of the different model components in SSNs, however, lead to suboptimal network estimation, slower convergence, and degenerated or erroneous predictions. In order to solve these problems while preserving favorable model properties, we propose a non-invasive post-hoc orthogonalization (PHO) that guarantees identifiability of model components and provides better estimation and prediction quality. Our theoretical findings are supported by numerical experiments, a benchmark comparison as well as a real-world application to COVID-19 infections.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry
Authors:
Jonas Gregor Wiese,
Lisa Wimmer,
Theodore Papamarkou,
Bernd Bischl,
Stephan Günnemann,
David Rügamer
Abstract:
Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that…
▽ More
Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
mixdistreg: An R Package for Fitting Mixture of Experts Distributional Regression with Adaptive First-order Methods
Authors:
David Rügamer
Abstract:
This paper presents a high-level description of the R software package mixdistreg to fit mixture of experts distributional regression models. The proposed framework is implemented in R using the deepregression software template, which is based on TensorFlow and follows the neural structured additive learning principle. The software comprises various approaches as special cases, including mixture d…
▽ More
This paper presents a high-level description of the R software package mixdistreg to fit mixture of experts distributional regression models. The proposed framework is implemented in R using the deepregression software template, which is based on TensorFlow and follows the neural structured additive learning principle. The software comprises various approaches as special cases, including mixture density networks and mixture regression approaches. Various code examples are given to demonstrate the package's functionality.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Estimating Conditional Distributions with Neural Networks using R package deeptrafo
Authors:
Lucas Kook,
Philipp FM Baumann,
Oliver Dürr,
Beate Sick,
David Rügamer
Abstract:
Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fit…
▽ More
Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fitting flexible regression models for conditional distributions using a tensorflow backend with numerous additional processors, such as neural networks, penalties, and smoothing splines. Package deeptrafo implements deep conditional transformation models (DCTMs) for binary, ordinal, count, survival, continuous, and time series responses, potentially with uninformative censoring. Unlike other available methods, DCTMs do not assume a parametric family of distributions for the response. Further, the data analyst may trade off interpretability and flexibility by supplying custom neural network architectures and smoothers for each term in an intuitive formula interface. We demonstrate how to set up, fit, and work with DCTMs for several response types. We further showcase how to construct ensembles of these models, evaluate models using inbuilt cross-validation, and use other convenience functions for DCTMs in several applications. Lastly, we discuss DCTMs in light of other approaches to regression with non-tabular data.
△ Less
Submitted 28 May, 2024; v1 submitted 24 November, 2022;
originally announced November 2022.
-
Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods
Authors:
David Rügamer,
Florian Pfisterer,
Bernd Bischl,
Bettina Grün
Abstract:
In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that all…
▽ More
In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that allows for the definition of mixtures of many different families, estimation in high-dimensional and large sample size settings and robust optimization based on TensorFlow. Numerical experiments with simulated and real-world data applications show that optimization is as reliable as estimation via classical approaches in many different settings and that results may be obtained for complicated scenarios where classical approaches consistently fail.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Uncertainty-aware predictive modeling for fair data-driven decisions
Authors:
Patrick Kaiser,
Christoph Kern,
David Rügamer
Abstract:
Both industry and academia have made considerable progress in developing trustworthy and responsible machine learning (ML) systems. While critical concepts like fairness and explainability are often addressed, the safety of systems is typically not sufficiently taken into account. By viewing data-driven decision systems as socio-technical systems, we draw on the uncertainty in ML literature to sho…
▽ More
Both industry and academia have made considerable progress in developing trustworthy and responsible machine learning (ML) systems. While critical concepts like fairness and explainability are often addressed, the safety of systems is typically not sufficiently taken into account. By viewing data-driven decision systems as socio-technical systems, we draw on the uncertainty in ML literature to show how fairML systems can also be safeML systems. We posit that a fair model needs to be an uncertainty-aware model, e.g. by drawing on distributional regression. For fair decisions, we argue that a safe fail option should be used for individuals with uncertain categorization. We introduce semi-structured deep distributional regression as a modeling framework which addresses multiple concerns brought against standard ML models and show its use in a real-world example of algorithmic profiling of job seekers.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models
Authors:
Daniel Schalk,
Bernd Bischl,
David Rügamer
Abstract:
Various privacy-preserving frameworks that respect the individual's privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy…
▽ More
Various privacy-preserving frameworks that respect the individual's privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.
△ Less
Submitted 10 March, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
ARMA Cell: A Modular and Effective Approach for Neural Autoregressive Modeling
Authors:
Philipp Schiele,
Christoph Berninger,
David Rügamer
Abstract:
The autoregressive moving average (ARMA) model is a classical, and arguably one of the most studied approaches to model time series data. It has compelling theoretical properties and is widely used among practitioners. More recent deep learning approaches popularize recurrent neural networks (RNNs) and, in particular, Long Short-Term Memory (LSTM) cells that have become one of the best performing…
▽ More
The autoregressive moving average (ARMA) model is a classical, and arguably one of the most studied approaches to model time series data. It has compelling theoretical properties and is widely used among practitioners. More recent deep learning approaches popularize recurrent neural networks (RNNs) and, in particular, Long Short-Term Memory (LSTM) cells that have become one of the best performing and most common building blocks in neural time series modeling. While advantageous for time series data or sequences with long-term effects, complex RNN cells are not always a must and can sometimes even be inferior to simpler recurrent approaches. In this work, we introduce the ARMA cell, a simpler, modular, and effective approach for time series modeling in neural networks. This cell can be used in any neural network architecture where recurrent structures are present and naturally handles multivariate time series using vector autoregression. We also introduce the ConvARMA cell as a natural successor for spatially-correlated time series. Our experiments show that the proposed methodology is competitive with popular alternatives in terms of performance while being more robust and compelling due to its simplicity
△ Less
Submitted 11 January, 2024; v1 submitted 31 August, 2022;
originally announced August 2022.
-
Additive Higher-Order Factorization Machines
Authors:
David Rügamer
Abstract:
In the age of big data and interpretable machine learning, approaches need to work at scale and at the same time allow for a clear mathematical understanding of the method's inner workings. While there exist inherently interpretable semi-parametric regression techniques for large-scale applications to account for non-linearity in the data, their model complexity is still often restricted. One of t…
▽ More
In the age of big data and interpretable machine learning, approaches need to work at scale and at the same time allow for a clear mathematical understanding of the method's inner workings. While there exist inherently interpretable semi-parametric regression techniques for large-scale applications to account for non-linearity in the data, their model complexity is still often restricted. One of the main limitations are missing interactions in these models, which are not included for the sake of better interpretability, but also due to untenable computational costs. To address this shortcoming, we derive a scalable high-order tensor product spline model using a factorization approach. Our method allows to include all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We prove both theoretically and empirically that our methods scales notably better than existing approaches, derive meaningful penalization schemes and also discuss further theoretical aspects. We finally investigate predictive and estimation performance both with synthetic and real data.
△ Less
Submitted 1 February, 2023; v1 submitted 28 May, 2022;
originally announced May 2022.
-
Factorized Structured Regression for Large-Scale Varying Coefficient Models
Authors:
David Rügamer,
Andreas Bender,
Simon Wiegrebe,
Daniel Racek,
Bernd Bischl,
Christian L. Müller,
Clemens Stachl
Abstract:
Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structur…
▽ More
Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR's performance and interpretability on a large-scale behavioral study with smartphone user data.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
DeepPAMM: Deep Piecewise Exponential Additive Mixed Models for Complex Hazard Structures in Survival Analysis
Authors:
Philipp Kopper,
Simon Wiegrebe,
Bernd Bischl,
Andreas Bender,
David Rügamer
Abstract:
Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes and is prevalent in many domains, particularly biomedical applications. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions, concealed by truncation and censoring processes. The piecewise exponential additive mixed model (PAMM) is a model cla…
▽ More
Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes and is prevalent in many domains, particularly biomedical applications. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions, concealed by truncation and censoring processes. The piecewise exponential additive mixed model (PAMM) is a model class addressing many of these challenges, yet PAMMs are not applicable in high-dimensional feature settings or in the case of unstructured or multimodal data. We unify existing approaches by proposing DeepPAMM, a versatile deep learning framework that is well-founded from a statistical point of view, yet with enough flexibility for modeling complex hazard structures. We illustrate that DeepPAMM is competitive with other machine learning approaches with respect to predictive performance while maintaining interpretability through benchmark experiments and an extended case study.
△ Less
Submitted 12 February, 2022;
originally announced February 2022.
-
Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach
Authors:
Magdalena Mittermeier,
Maximilian Weigert,
David Rügamer
Abstract:
Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based…
▽ More
Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based on the combination of a catalog of labeled circulation patterns and spatial atmospheric variables, we propose a smoothed convolutional neural network classifier for six types of anticyclonic circulations that are associated with drought and heat. Our work can help to identify important drivers of hot and dry extremes in climate simulations, which allows to unveil the impact of climate change on these drivers. We address various challenges inherent to circulation pattern classification that are also present in other climate patterns, e.g., subjective labels and unambiguous transition periods.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Accelerated Componentwise Gradient Boosting using Efficient Data Representation and Momentum-based Optimization
Authors:
Daniel Schalk,
Bernd Bischl,
David Rügamer
Abstract:
Componentwise boosting (CWB), also known as model-based boosting, is a variant of gradient boosting that builds on additive models as base learners to ensure interpretability. CWB is thus often used in research areas where models are employed as tools to explain relationships in data. One downside of CWB is its computational complexity in terms of memory and runtime. In this paper, we propose two…
▽ More
Componentwise boosting (CWB), also known as model-based boosting, is a variant of gradient boosting that builds on additive models as base learners to ensure interpretability. CWB is thus often used in research areas where models are employed as tools to explain relationships in data. One downside of CWB is its computational complexity in terms of memory and runtime. In this paper, we propose two techniques to overcome these issues without losing the properties of CWB: feature discretization of numerical features and incorporating Nesterov momentum into functional gradient descent. As the latter can be prone to early overfitting, we also propose a hybrid approach that prevents a possibly diverging gradient descent routine while ensuring faster convergence. We perform extensive benchmarks on multiple simulated and real-world data sets to demonstrate the improvements in runtime and memory consumption while maintaining state-of-the-art estimation and prediction performance.
△ Less
Submitted 29 October, 2021; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Automatic Componentwise Boosting: An Interpretable AutoML System
Authors:
Stefan Coors,
Daniel Schalk,
Bernd Bischl,
David Rügamer
Abstract:
In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML system…
▽ More
In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML systems like auto-sklearn, H20-AutoML or TPOT aim for high predictive performance, thereby generating ensembles that consist almost exclusively of black-box models. This, in turn, makes the interpretation for the layperson more intricate and adds another layer of opacity for users. We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions, allows for a straightforward calculation of feature importance, and gives insights into the required model complexity to fit the given task. We introduce the general framework and outline its implementation autocompboost. To demonstrate the frameworks efficacy, we compare autocompboost to other existing systems based on the OpenML AutoML-Benchmark. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets while being more user-friendly and transparent.
△ Less
Submitted 16 October, 2021; v1 submitted 12 September, 2021;
originally announced September 2021.
-
deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression
Authors:
David Rügamer,
Chris Kolb,
Cornelius Fritz,
Florian Pfisterer,
Philipp Kopper,
Bernd Bischl,
Ruolin Shen,
Christina Bukas,
Lisa Barros de Andrade e Sousa,
Dominik Thalmeier,
Philipp Baumann,
Lucas Kook,
Nadja Klein,
Christian L. Müller
Abstract:
In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library \pkg{TensorFlow} for the fusion of various statistical and deep…
▽ More
In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library \pkg{TensorFlow} for the fusion of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks, as well as (3) pre-processing steps necessary to set up such models. The software package allows to define models in a user-friendly manner via a formula interface that is inspired by classical statistical model frameworks such as \pkg{mgcv}. The packages' modular design and functionality provides a unique resource for both scalable estimation of complex statistical models and the combination of approaches from deep learning and statistics. This allows for state-of-the-art predictive performance while simultaneously retaining the indispensable interpretability of classical statistical models.
△ Less
Submitted 10 March, 2022; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Combining Graph Neural Networks and Spatio-temporal Disease Models to Predict COVID-19 Cases in Germany
Authors:
Cornelius Fritz,
Emilio Dorigatti,
David Rügamer
Abstract:
During 2020, the infection rate of COVID-19 has been investigated by many scholars from different research fields. In this context, reliable and interpretable forecasts of disease incidents are a vital tool for policymakers to manage healthcare resources. Several experts have called for the necessity to account for human mobility to explain the spread of COVID-19. Existing approaches are often app…
▽ More
During 2020, the infection rate of COVID-19 has been investigated by many scholars from different research fields. In this context, reliable and interpretable forecasts of disease incidents are a vital tool for policymakers to manage healthcare resources. Several experts have called for the necessity to account for human mobility to explain the spread of COVID-19. Existing approaches are often applying standard models of the respective research field. This habit, however, often comes along with certain restrictions. For instance, most statistical or epidemiological models cannot directly incorporate unstructured data sources, including relational data that may encode human mobility. In contrast, machine learning approaches may yield better predictions by exploiting these data structures, yet lack intuitive interpretability as they are often categorized as black-box models. We propose a trade-off between both research directions and present a multimodal learning approach that combines the advantages of statistical regression and machine learning models for predicting local COVID-19 cases in Germany. This novel approach enables the use of a richer collection of data types, including mobility flows and colocation probabilities, and yields the lowest MSE scores throughout our observational period in our benchmark study. The results corroborate the necessity of including mobility data and showcase the flexibility and interpretability of our approach.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
Semi-Structured Deep Piecewise Exponential Models
Authors:
Philipp Kopper,
Sebastian Pölsterl,
Christian Wachinger,
Bernd Bischl,
Andreas Bender,
David Rügamer
Abstract:
We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise exponential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data so…
▽ More
We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise exponential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the simultaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer's disease progression based on tabular and 3D point cloud data and applying it to synthetic data.
△ Less
Submitted 1 March, 2021; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Deep Conditional Transformation Models
Authors:
Philipp F. M. Baumann,
Torsten Hothorn,
David Rügamer
Abstract:
Learning the cumulative distribution function (CDF) of an outcome variable conditional on a set of features remains challenging, especially in high-dimensional settings. Conditional transformation models provide a semi-parametric approach that allows to model a large class of conditional CDFs without an explicit parametric distribution assumption and with only a few parameters. Existing estimation…
▽ More
Learning the cumulative distribution function (CDF) of an outcome variable conditional on a set of features remains challenging, especially in high-dimensional settings. Conditional transformation models provide a semi-parametric approach that allows to model a large class of conditional CDFs without an explicit parametric distribution assumption and with only a few parameters. Existing estimation approaches within this class are, however, either limited in their complexity and applicability to unstructured data sources such as images or text, lack interpretability, or are restricted to certain types of outcomes. We close this gap by introducing the class of deep conditional transformation models which unifies existing approaches and allows to learn both interpretable (non-)linear model terms and more complex neural network predictors in one holistic framework. To this end we propose a novel network architecture, provide details on different model definitions and derive suitable constraints as well as network regularization terms. We demonstrate the efficacy of our approach through numerical experiments and applications.
△ Less
Submitted 6 April, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Neural Mixture Distributional Regression
Authors:
David Rügamer,
Florian Pfisterer,
Bernd Bischl
Abstract:
We present neural mixture distributional regression (NMDR), a holistic framework to estimate complex finite mixtures of distributional regressions defined by flexible additive predictors. Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings, allows for efficient and scalable optimization and can be applied to recent concepts…
▽ More
We present neural mixture distributional regression (NMDR), a holistic framework to estimate complex finite mixtures of distributional regressions defined by flexible additive predictors. Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings, allows for efficient and scalable optimization and can be applied to recent concepts that combine structured regression models with deep neural networks. While many existing approaches for mixture models address challenges in optimization of such and provide results for convergence under specific model assumptions, our approach is assumption-free and instead makes use of optimizers well-established in deep learning. Through extensive numerical experiments and a high-dimensional deep learning application we provide evidence that the proposed approach is competitive to existing approaches and works well in more complex scenarios.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Selective Inference for Additive and Linear Mixed Models
Authors:
David Rügamer,
Philipp F. M. Baumann,
Sonja Greven
Abstract:
This work addresses the problem of conducting valid inference for additive and linear mixed models after model selection. One possible solution to overcome overconfident inference results after model selection is selective inference, which constitutes a post-selection inference framework, yielding valid inference statements by conditioning on the selection event. We extend recent work on selective…
▽ More
This work addresses the problem of conducting valid inference for additive and linear mixed models after model selection. One possible solution to overcome overconfident inference results after model selection is selective inference, which constitutes a post-selection inference framework, yielding valid inference statements by conditioning on the selection event. We extend recent work on selective inference to the class of additive and linear mixed models for any type of model selection mechanism that can be expressed as a function of the outcome variable (and potentially on covariates on which it conditions). We investigate the properties of our proposal in simulation studies and apply the framework to a data set in monetary economics. Due to the generality of our proposed approach, the presented approach also works for non-standard selection procedures, which we demonstrate in our application. Here, the final additive mixed model is selected using a hierarchical selection procedure, which is based on the conditional Akaike information criterion and involves varying data set sizes.
△ Less
Submitted 20 December, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.
-
A General Machine Learning Framework for Survival Analysis
Authors:
Andreas Bender,
David Rügamer,
Fabian Scheipl,
Bernd Bischl
Abstract:
The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The…
▽ More
The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The methods that do provide extensions usually address at most a subset of these challenges and often require specialized software that can not be integrated into standard machine learning workflows directly. In this work, we present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks. This reformulation is based on well developed statistical theory. With the proposed approach, any algorithm that can optimize a Poisson (log-)likelihood, such as gradient boosted trees, deep neural networks, model-based boosting and many more can be used in the context of time-to-event analysis. The proposed technique does not require any assumptions with respect to the distribution of event times or the functional shapes of feature and interaction effects. Based on the proposed framework we develop new methods that are competitive with specialized state of the art approaches in terms of accuracy, and versatility, but with comparatively small investments of programming effort or requirements for specialized methodological know-how.
△ Less
Submitted 17 April, 2021; v1 submitted 27 June, 2020;
originally announced June 2020.
-
A Bayesian Time-Varying Autoregressive Model for Improved Short- and Long-Term Prediction
Authors:
Christoph Berninger,
Almond Stöcker,
David Rügamer
Abstract:
Motivated by the application to German interest rates, we propose a timevarying autoregressive model for short and long term prediction of time series that exhibit a temporary non-stationary behavior but are assumed to mean revert in the long run. We use a Bayesian formulation to incorporate prior assumptions on the mean reverting process in the model and thereby regularize predictions in the far…
▽ More
Motivated by the application to German interest rates, we propose a timevarying autoregressive model for short and long term prediction of time series that exhibit a temporary non-stationary behavior but are assumed to mean revert in the long run. We use a Bayesian formulation to incorporate prior assumptions on the mean reverting process in the model and thereby regularize predictions in the far future. We use MCMC-based inference by deriving relevant full conditional distributions and employ a Metropolis-Hastings within Gibbs Sampler approach to sample from the posterior (predictive) distribution. In combining data-driven short term predictions with long term distribution assumptions our model is competitive to the existing methods in the short horizon while yielding reasonable predictions in the long run. We apply our model to interest rate data and contrast the forecasting performance to the one of a 2-Additive-Factor Gaussian model as well as to the predictions of a dynamic Nelson-Siegel model.
△ Less
Submitted 21 February, 2021; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Semi-Structured Distributional Regression -- Extending Structured Additive Models by Arbitrary Deep Neural Networks and Data Modalities
Authors:
David Rügamer,
Chris Kolb,
Nadja Klein
Abstract:
Combining additive models and neural networks allows to broaden the scope of statistical regression and extend deep learning-based approaches by interpretable structured additive predictors at the same time. Existing attempts uniting the two modeling approaches are, however, limited to very specific combinations and, more importantly, involve an identifiability issue. As a consequence, interpretab…
▽ More
Combining additive models and neural networks allows to broaden the scope of statistical regression and extend deep learning-based approaches by interpretable structured additive predictors at the same time. Existing attempts uniting the two modeling approaches are, however, limited to very specific combinations and, more importantly, involve an identifiability issue. As a consequence, interpretability and stable estimation are typically lost. We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture. To overcome the inherent identifiability issues between different model parts, we construct an orthogonalization cell that projects the deep neural network into the orthogonal complement of the statistical model predictor. This enables proper estimation of structured model parts and thereby interpretability. We demonstrate the framework's efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications.
△ Less
Submitted 9 July, 2022; v1 submitted 13 February, 2020;
originally announced February 2020.
-
Inference for $L_2$-Boosting
Authors:
David Rügamer,
Sonja Greven
Abstract:
We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as $L_2$-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt vari…
▽ More
We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as $L_2$-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by $L_2$-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.
△ Less
Submitted 4 June, 2019; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Conditional Model Selection in Mixed-Effects Models with cAIC4
Authors:
Benjamin Säfken,
David Rügamer,
Thomas Kneib,
Sonja Greven
Abstract:
Model selection in mixed models based on the conditional distribution is appropriate for many practical applications and has been a focus of recent statistical research. In this paper we introduce the R-package cAIC4 that allows for the computation of the conditional Akaike Information Criterion (cAIC). Computation of the conditional AIC needs to take into account the uncertainty of the random eff…
▽ More
Model selection in mixed models based on the conditional distribution is appropriate for many practical applications and has been a focus of recent statistical research. In this paper we introduce the R-package cAIC4 that allows for the computation of the conditional Akaike Information Criterion (cAIC). Computation of the conditional AIC needs to take into account the uncertainty of the random effects variance and is therefore not straightforward. We introduce a fast and stable implementation for the calculation of the cAIC for linear mixed models estimated with lme4 and additive mixed models estimated with gamm4 . Furthermore, cAIC4 offers a stepwise function that allows for a fully automated stepwise selection scheme for mixed models based on the conditional AIC. Examples of many possible applications are presented to illustrate the practical impact and easy handling of the package.
△ Less
Submitted 17 March, 2018; v1 submitted 15 March, 2018;
originally announced March 2018.
-
Selective inference after likelihood- or test-based model selection in linear models
Authors:
David Rügamer,
Sonja Greven
Abstract:
Statistical inference after model selection requires an inference framework that takes the selection into account in order to be valid. Following recent work on selective inference, we derive analytical expressions for inference after likelihood- or test-based model selection for linear models.
Statistical inference after model selection requires an inference framework that takes the selection into account in order to be valid. Following recent work on selective inference, we derive analytical expressions for inference after likelihood- or test-based model selection for linear models.
△ Less
Submitted 23 September, 2017; v1 submitted 29 June, 2017;
originally announced June 2017.
-
Boosting Functional Regression Models with FDboost
Authors:
Sarah Brockhaus,
David Rügamer,
Sonja Greven
Abstract:
The R add-on package FDboost is a flexible toolbox for the estimation of functional regression models by model-based boosting. It provides the possibility to fit regression models for scalar and functional response with effects of scalar as well as functional covariates, i.e., scalar-on-function, function-on-scalar and function-on-function regression models. In addition to mean regression, quantil…
▽ More
The R add-on package FDboost is a flexible toolbox for the estimation of functional regression models by model-based boosting. It provides the possibility to fit regression models for scalar and functional response with effects of scalar as well as functional covariates, i.e., scalar-on-function, function-on-scalar and function-on-function regression models. In addition to mean regression, quantile regression models as well as generalized additive models for location scale and shape can be fitted with FDboost. Furthermore, boosting can be used in high-dimensional data settings with more covariates than observations. We provide a hands-on tutorial on model fitting and tuning, including the visualization of results. The methods for scalar-on-function regression are illustrated with spectrometric data of fossil fuels and those for functional response regression with a data set including bioelectrical signals for emotional episodes.
△ Less
Submitted 26 April, 2018; v1 submitted 30 May, 2017;
originally announced May 2017.