-
Boosting Distributional Copula Regression for Bivariate Right-Censored Time-to-Event Data
Authors:
Guillermo Briseno-Sanchez,
Nadja Klein,
Andreas Groll,
Andreas Mayr
Abstract:
We propose a highly flexible distributional copula regression model for bivariate time-to-event data in the presence of right-censoring. The joint survival function of the response is constructed using parametric copulas, allowing for a separate specification of the dependence structure between the time-to-event outcome variables and their respective marginal survival distributions. The latter are…
▽ More
We propose a highly flexible distributional copula regression model for bivariate time-to-event data in the presence of right-censoring. The joint survival function of the response is constructed using parametric copulas, allowing for a separate specification of the dependence structure between the time-to-event outcome variables and their respective marginal survival distributions. The latter are specified using well-known parametric distributions such as the log-Normal, log-Logistic (proportional odds model), or Weibull (proportional hazards model) distributions. Hence, the marginal univariate event times can be specified as parametric (also known as Accelerated Failure Time, AFT) models. Embedding our model into the class of generalized additive models for location, scale and shape, possibly all distribution parameters of the joint survival function can depend on covariates. We develop a component-wise gradient-based boosting algorithm for estimation. This way, our approach is able to conduct data-driven variable selection. To the best of our knowledge, this is the first implementation of multivariate AFT models via distributional copula regression with automatic variable selection via statistical boosting. A special merit of our approach is that it works for high-dimensional (p>>n) settings. We illustrate the practical potential of our method on a high-dimensional application related to semi-competing risks responses in ovarian cancer. All of our methods are implemented in the open source statistical software R as add-on functions of the package gamboostLSS.
△ Less
Submitted 20 December, 2024; v1 submitted 19 December, 2024;
originally announced December 2024.
-
Enhanced variable selection for boosting sparser and less complex models in distributional copula regression
Authors:
Annika Strömer,
Nadja Klein,
Christian Staerk,
Florian Faschingbauer,
Hannah Klinkhammer,
Andreas Mayr
Abstract:
Structured additive distributional copula regression allows to model the joint distribution of multivariate outcomes by relating all distribution parameters to covariates. Estimation via statistical boosting enables accounting for high-dimensional data and incorporating data-driven variable selection, both of which are useful given the complexity of the model class. However, as known from univaria…
▽ More
Structured additive distributional copula regression allows to model the joint distribution of multivariate outcomes by relating all distribution parameters to covariates. Estimation via statistical boosting enables accounting for high-dimensional data and incorporating data-driven variable selection, both of which are useful given the complexity of the model class. However, as known from univariate (distributional) regression, the standard boosting algorithm tends to select too many variables with minor importance, particularly in settings with large sample sizes, leading to complex models with difficult interpretation. To counteract this behavior and to avoid selecting base-learners with only a negligible impact, we combined the ideas of probing, stability selection and a new deselection approach with statistical boosting for distributional copula regression. In a simulation study and an application to the joint modelling of weight and length of newborns, we found that all proposed methods enhance variable selection by reducing the number of false positives. However, only stability selection and the deselection approach yielded similar predictive performance to classical boosting. Finally, the deselection approach is better scalable to larger datasets and led to a competitive predictive performance, which we further illustrated in a genomic cohort study from the UK Biobank by modelling the joint genetic predisposition for two phenotypes.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A Balanced Statistical Boosting Approach for GAMLSS via New Step Lengths
Authors:
Alexandra Daub,
Andreas Mayr,
Boyao Zhang,
Elisabeth Bergherr
Abstract:
Component-wise gradient boosting algorithms are popular for their intrinsic variable selection and implicit regularization, which can be especially beneficial for very flexible model classes. When estimating generalized additive models for location, scale and shape (GAMLSS) by means of a component-wise gradient boosting algorithm, an important part of the estimation procedure is to determine the r…
▽ More
Component-wise gradient boosting algorithms are popular for their intrinsic variable selection and implicit regularization, which can be especially beneficial for very flexible model classes. When estimating generalized additive models for location, scale and shape (GAMLSS) by means of a component-wise gradient boosting algorithm, an important part of the estimation procedure is to determine the relative complexity of the submodels corresponding to the different distribution parameters. Existing methods either suffer from a computationally expensive tuning procedure or can be biased by structural differences in the negative gradients' sizes, which, if encountered, lead to imbalances between the different submodels. Shrunk optimal step lengths have been suggested to replace the typical small fixed step lengths for a non-cyclical boosting algorithm limited to a Gaussian response variable in order to address this issue. In this article, we propose a new adaptive step length approach that accounts for the relative size of the fitted base-learners to ensure a natural balance between the different submodels. The new balanced boosting approach thus represents a computationally efficient and easily generalizable alternative to shrunk optimal step lengths. We implemented the balanced non-cyclical boosting algorithm for a Gaussian, a negative binomial as well as a Weibull distributed response variable and demonstrate the competitive performance of the new adaptive step length approach by means of a simulation study, in the analysis of count data modeling the number of doctor's visits as well as for survival data in an oncological trial.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks
Authors:
Lisa Schneckenreiter,
Richard Freinschlag,
Florian Sestak,
Johannes Brandstetter,
Günter Klambauer,
Andreas Mayr
Abstract:
Graph neural networks (GNNs), and especially message-passing neural networks, excel in various domains such as physics, drug discovery, and molecular modeling. The expressivity of GNNs with respect to their ability to discriminate non-isomorphic graphs critically depends on the functions employed for message aggregation and graph-level readout. By applying signal propagation theory, we propose a v…
▽ More
Graph neural networks (GNNs), and especially message-passing neural networks, excel in various domains such as physics, drug discovery, and molecular modeling. The expressivity of GNNs with respect to their ability to discriminate non-isomorphic graphs critically depends on the functions employed for message aggregation and graph-level readout. By applying signal propagation theory, we propose a variance-preserving aggregation function (VPA) that maintains expressivity, but yields improved forward and backward dynamics. Experiments demonstrate that VPA leads to increased predictive performance for popular GNN architectures as well as improved learning dynamics. Our results could pave the way towards normalizer-free or self-normalizing GNNs.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Boosting Distributional Copula Regression for Bivariate Binary, Discrete and Mixed Responses
Authors:
Guillermo Briseño Sanchez,
Nadja Klein,
Hannah Klinkhammer,
Andreas Mayr
Abstract:
Motivated by challenges in the analysis of biomedical data and observational studies, we develop statistical boosting for the general class of bivariate distributional copula regression with arbitrary marginal distributions, which is suited to model binary, count, continuous or mixed outcomes. In our framework, the joint distribution of arbitrary, bivariate responses is modelled through a parametr…
▽ More
Motivated by challenges in the analysis of biomedical data and observational studies, we develop statistical boosting for the general class of bivariate distributional copula regression with arbitrary marginal distributions, which is suited to model binary, count, continuous or mixed outcomes. In our framework, the joint distribution of arbitrary, bivariate responses is modelled through a parametric copula. To arrive at a model for the entire conditional distribution, not only the marginal distribution parameters but also the copula parameters are related to covariates through additive predictors. We suggest efficient and scalable estimation by means of an adapted component-wise gradient boosting algorithm with statistical models as base-learners. A key benefit of boosting as opposed to classical likelihood or Bayesian estimation is the implicit data-driven variable selection mechanism as well as shrinkage without additional input or assumptions from the analyst. To the best of our knowledge, our implementation is the only one that combines a wide range of covariate effects, marginal distributions, copula functions, and implicit data-driven variable selection. We showcase the versatility of our approach on data from genetic epidemiology, healthcare utilization and childhood undernutrition. Our developments are implemented in the R package gamboostLSS, fostering transparent and reproducible research.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Boosting Multivariate Structured Additive Distributional Regression Models
Authors:
Annika Strömer,
Nadja Klein,
Christian Staerk,
Hannah Klinkhammer,
Andreas Mayr
Abstract:
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dime…
▽ More
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modelling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Boosting Distributional Copula Regression
Authors:
Nicolai Hans,
Nadja Klein,
Florian Faschingbauer,
Michael Schneider,
Andreas Mayr
Abstract:
Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model the joint distribution of multiple outcome variables by disentangling the marginal response distributions and their dependence structure. In a regression setup…
▽ More
Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model the joint distribution of multiple outcome variables by disentangling the marginal response distributions and their dependence structure. In a regression setup each parameter of the copula model, i.e. the marginal distribution parameters and the copula dependence parameters, can be related to covariates via structured additive predictors. We propose a framework to fit distributional copula regression models via a model-based boosting algorithm. Model-based boosting is a modern estimation technique that incorporates useful features like an intrinsic variable selection mechanism, parameter shrinkage and the capability to fit regression models in high dimensional data setting, i.e. situations with more covariates than observations. Thus, model-based boosting does not only complement existing Bayesian and maximum-likelihood based estimation frameworks for this model class but rather enables unique intrinsic mechanisms that can be helpful in many applied problems. The performance of our boosting algorithm in the context of copula regression models with continuous margins is evaluated in simulation studies that cover low- and high-dimensional data settings and situations with and without dependence between the responses. Moreover, distributional copula boosting is used to jointly analyze and predict the length and the weight of newborns conditional on sonographic measurements of the fetus before delivery together with other clinical variables.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Deselection of Base-Learners for Statistical Boosting -- with an Application to Distributional Regression
Authors:
Annika Strömer,
Christian Staerk,
Nadja Klein,
Leonie Weinhold,
Stephanie Titze,
Andreas Mayr
Abstract:
We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include to…
▽ More
We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data (p<n), where we observe a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor importance. We analyze the impact of the new approach on variable selection and prediction performance in comparison to alternative methods including boosting with earlier stopping as well as twin boosting. We illustrate our approach with data of an ongoing cohort study for chronic kidney disease patients, where the most influential predictors for the health-related quality of life measure are selected in a distributional regression approach based on beta regression.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Estimating the course of the COVID-19 pandemic in Germany via spline-based hierarchical modelling of death counts
Authors:
Tobias Wistuba,
Andreas Mayr,
Christian Staerk
Abstract:
The effective reproduction number is a key figure to monitor the course of the COVID-19 pandemic. In this study we consider a retrospective modelling approach for estimating the effective reproduction number based on death counts during the first year of the pandemic in Germany. The proposed Bayesian hierarchical model incorporates splines to estimate reproduction numbers flexibly over time while…
▽ More
The effective reproduction number is a key figure to monitor the course of the COVID-19 pandemic. In this study we consider a retrospective modelling approach for estimating the effective reproduction number based on death counts during the first year of the pandemic in Germany. The proposed Bayesian hierarchical model incorporates splines to estimate reproduction numbers flexibly over time while adjusting for varying effective infection fatality rates. The approach also provides estimates of dark figures regarding undetected infections over time. Results for Germany illustrate that estimated reproduction numbers based on death counts are often similar to classical estimates based on confirmed cases. However, considering death counts proves to be more robust against shifts in testing policies: during the second wave of infections, classical estimation of the reproduction number suggests a flattening/ decreasing trend of infections following the "lockdown light" in November 2020, while our results indicate that true numbers of infections continued to rise until the "second lockdown" in December 2020. This observation is associated with more stringent testing criteria introduced concurrently with the "lockdown light", which is reflected in subsequently increasing dark figures of infections estimated by our model. These findings illustrate that the retrospective viewpoint can provide additional insights regarding the course of the pandemic. In light of progressive vaccinations, shifting the focus from modelling confirmed cases to reported deaths with the possibility to incorporate effective infection fatality rates might be of increasing relevance for the future surveillance of the pandemic.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Boundary Graph Neural Networks for 3D Simulations
Authors:
Andreas Mayr,
Sebastian Lehner,
Arno Mayrhofer,
Christoph Kloss,
Sepp Hochreiter,
Johannes Brandstetter
Abstract:
The abundance of data has given machine learning considerable momentum in natural sciences and engineering, though modeling of physical processes is often difficult. A particularly tough problem is the efficient representation of geometric boundaries. Triangularized geometric boundaries are well understood and ubiquitous in engineering applications. However, it is notoriously difficult to integrat…
▽ More
The abundance of data has given machine learning considerable momentum in natural sciences and engineering, though modeling of physical processes is often difficult. A particularly tough problem is the efficient representation of geometric boundaries. Triangularized geometric boundaries are well understood and ubiquitous in engineering applications. However, it is notoriously difficult to integrate them into machine learning approaches due to their heterogeneity with respect to size and orientation. In this work, we introduce an effective theory to model particle-boundary interactions, which leads to our new Boundary Graph Neural Networks (BGNNs) that dynamically modify graph structures to obey boundary conditions. The new BGNNs are tested on complex 3D granular flow processes of hoppers, rotating drums and mixers, which are all standard components of modern industrial machinery but still have complicated geometry. BGNNs are evaluated in terms of computational efficiency as well as prediction accuracy of particle flows and mixing entropies. BGNNs are able to accurately reproduce 3D granular flows within simulation uncertainties over hundreds of thousands of simulation timesteps. Most notably, in our experiments, particles stay within the geometric objects without using handcrafted conditions or restrictions.
△ Less
Submitted 20 April, 2023; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Learning 3D Granular Flow Simulations
Authors:
Andreas Mayr,
Sebastian Lehner,
Arno Mayrhofer,
Christoph Kloss,
Sepp Hochreiter,
Johannes Brandstetter
Abstract:
Recently, the application of machine learning models has gained momentum in natural sciences and engineering, which is a natural fit due to the abundance of data in these fields. However, the modeling of physical processes from simulation data without first principle solutions remains difficult. Here, we present a Graph Neural Networks approach towards accurate modeling of complex 3D granular flow…
▽ More
Recently, the application of machine learning models has gained momentum in natural sciences and engineering, which is a natural fit due to the abundance of data in these fields. However, the modeling of physical processes from simulation data without first principle solutions remains difficult. Here, we present a Graph Neural Networks approach towards accurate modeling of complex 3D granular flow simulation processes created by the discrete element method LIGGGHTS and concentrate on simulations of physical systems found in real world applications like rotating drums and hoppers. We discuss how to implement Graph Neural Networks that deal with 3D objects, boundary conditions, particle - particle, and particle - boundary interactions such that an accurate modeling of relevant physical quantities is made possible. Finally, we compare the machine learning based trajectories to LIGGGHTS trajectories in terms of particle flows and mixing entropies.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Estimating effective infection fatality rates during the course of the COVID-19 pandemic in Germany
Authors:
Christian Staerk,
Tobias Wistuba,
Andreas Mayr
Abstract:
The infection fatality rate (IFR) of the Coronavirus Disease 2019 (COVID-19) is one of the most discussed figures in the context of this pandemic. Using German COVID-19 surveillance data and age-group specific IFR estimates from multiple international studies, this work investigates time-dependent variations in effective IFR over the course of the pandemic. Three different methods for estimating (…
▽ More
The infection fatality rate (IFR) of the Coronavirus Disease 2019 (COVID-19) is one of the most discussed figures in the context of this pandemic. Using German COVID-19 surveillance data and age-group specific IFR estimates from multiple international studies, this work investigates time-dependent variations in effective IFR over the course of the pandemic. Three different methods for estimating (effective) IFRs are presented: (a) population-averaged IFRs based on the assumption that the infection risk is independent of age and time, (b) effective IFRs based on the assumption that the age distribution of confirmed cases approximately reflects the age distribution of infected individuals, and (c) effective IFRs accounting for age- and time-dependent dark figures of infections. Results show that effective IFRs in Germany are estimated to vary over time, as the age distributions of confirmed cases and estimated infections are changing during the course of the pandemic. In particular during the first and second waves of infections in spring and autumn/winter 2020, there has been a pronounced shift in the age distribution of confirmed cases towards older age groups, resulting in larger effective IFR estimates. The temporary increase in effective IFR during the first wave is estimated to be smaller but still remains when adjusting for age- and time-dependent dark figures. A comparison of effective IFRs with observed CFRs indicates that a substantial fraction of the time-dependent variability in observed mortality can be explained by changes in the age distribution of infections. Furthermore, a vanishing gap between effective IFRs and observed CFRs is apparent after the first infection wave, while a moderately increasing gap can be observed during the second wave. Further research is warranted to obtain timely age-stratified IFR estimates.
△ Less
Submitted 21 January, 2021; v1 submitted 4 November, 2020;
originally announced November 2020.
-
Large-scale ligand-based virtual screening for SARS-CoV-2 inhibitors using deep neural networks
Authors:
Markus Hofmarcher,
Andreas Mayr,
Elisabeth Rumetshofer,
Peter Ruch,
Philipp Renz,
Johannes Schimunek,
Philipp Seidl,
Andreu Vall,
Michael Widrich,
Sepp Hochreiter,
Günter Klambauer
Abstract:
Due to the current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, there is an urgent need for novel therapies and drugs. We conducted a large-scale virtual screening for small molecules that are potential CoV-2 inhibitors. To this end, we utilized "ChemAI", a deep neural network trained on more than 220M data points across 3.6M molecules from three public drug-discovery dat…
▽ More
Due to the current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, there is an urgent need for novel therapies and drugs. We conducted a large-scale virtual screening for small molecules that are potential CoV-2 inhibitors. To this end, we utilized "ChemAI", a deep neural network trained on more than 220M data points across 3.6M molecules from three public drug-discovery databases. With ChemAI, we screened and ranked one billion molecules from the ZINC database for favourable effects against CoV-2. We then reduced the result to the 30,000 top-ranked compounds, which are readily accessible and purchasable via the ZINC database. Additionally, we screened the DrugBank using ChemAI to allow for drug repurposing, which would be a fast way towards a therapy. We provide these top-ranked compounds of ZINC and DrugBank as a library for further screening with bioassays at https://github.com/ml-jku/sars-cov-inhibitors-chemai.
△ Less
Submitted 17 August, 2020; v1 submitted 25 March, 2020;
originally announced April 2020.
-
RefCurv: A Software for the Construction of Pediatric Reference Curves
Authors:
Christian Winkler,
Katharina Linden,
Andreas Mayr,
Thomas Schultz,
Thomas Welchowski,
Johannes Breuer,
Ulrike Herberg
Abstract:
In medicine, reference curves serve as an important tool for everyday clinical practice. Pediatricians assess the growth process of children with the help of percentile curves serving as norm references. The mathematical methods for the construction of these reference curves are sophisticated and often require technical knowledge beyond the scope of physicians. An easy-to-use software for life sci…
▽ More
In medicine, reference curves serve as an important tool for everyday clinical practice. Pediatricians assess the growth process of children with the help of percentile curves serving as norm references. The mathematical methods for the construction of these reference curves are sophisticated and often require technical knowledge beyond the scope of physicians. An easy-to-use software for life scientists and physicians is missing. As a consequence, most medical publications do not document the construction properly. This project aims to develop a software that enables non-technical users to apply modern statistical methods to create and analyze reference curves. In this paper, we present RefCurv, a software that facilitates the construction of reference curves. The software comprises functionalities to select and visualize data. Users can fit models to the data and graphically present them as percentile curves. Furthermore, the software provides features to highlight possible outliers, perform model selection, and analyze the sensitivity. RefCurv is an open-source software with a graphical user interface (GUI) written in Python. It uses R and the gamlss add-on package (Rigby and Stasinopoulos (2005)) as the underlying statistical engine. In summary, RefCurv is the first software based on the gamlss package, which enables practitioners to construct and analyze reference curves in a user-friendly GUI. In broader terms, the software brings together the fields of statistical learning and medical application. Consequently, RefCurv can help to establish the construction of reference curves in other medical fields.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Extension of the Gradient Boosting Algorithm for Joint Modeling of Longitudinal and Time-to-Event data
Authors:
Colin Griesbach,
Andreas Mayr,
Elisabeth Waldmann
Abstract:
In various data situations joint models are an efficient tool to analyze relationships between time dependent covariates and event times or to correct for event-dependent dropout occurring in regression analysis. Joint modeling connects a longitudinal and a survival submodel within a single joint likelihood which then can be maximized by standard optimization methods. Main burdens of these convent…
▽ More
In various data situations joint models are an efficient tool to analyze relationships between time dependent covariates and event times or to correct for event-dependent dropout occurring in regression analysis. Joint modeling connects a longitudinal and a survival submodel within a single joint likelihood which then can be maximized by standard optimization methods. Main burdens of these conventional methods are that the computational effort increases rapidly in higher dimensions and they do not offer special tools for proper variable selection. Gradient boosting techniques are well known among statisticians for addressing exactly these problems, hence an initial boosting algorithm to fit a basic joint model based on functional gradient descent methods has been proposed. Aim of this work is to extend this algorithm in order to fit a model incorporating baseline covariates affecting solely the survival part of the model. The extended algorithm is evaluated based on low and high dimensional simulation runs as well as a data set on AIDS patients, where the longitudinal submodel models the underlying profile of the CD4 cell count which then gets included alongside several baseline covariates in the survival submodel.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
Gradient boosting in Markov-switching generalized additive models for location, scale and shape
Authors:
Timo Adam,
Andreas Mayr,
Thomas Kneib
Abstract:
We propose a novel class of flexible latent-state time series regression models which we call Markov-switching generalized additive models for location, scale and shape. In contrast to conventional Markov-switching regression models, the presented methodology allows us to model different state-dependent parameters of the response distribution - not only the mean, but also variance, skewness and ku…
▽ More
We propose a novel class of flexible latent-state time series regression models which we call Markov-switching generalized additive models for location, scale and shape. In contrast to conventional Markov-switching regression models, the presented methodology allows us to model different state-dependent parameters of the response distribution - not only the mean, but also variance, skewness and kurtosis parameters - as potentially smooth functions of a given set of explanatory variables. In addition, the set of possible distributions that can be specified for the response is not limited to the exponential family but additionally includes, for instance, a variety of Box-Cox-transformed, zero-inflated and mixture distributions. We propose an estimation approach based on the EM algorithm, where we use the gradient boosting framework to prevent overfitting while simultaneously performing variable selection. The feasibility of the suggested approach is assessed in simulation experiments and illustrated in a real-data setting, where we model the conditional distribution of the daily average price of energy in Spain over time.
△ Less
Submitted 17 May, 2018; v1 submitted 6 October, 2017;
originally announced October 2017.
-
Self-Normalizing Neural Networks
Authors:
Günter Klambauer,
Thomas Unterthiner,
Andreas Mayr,
Sepp Hochreiter
Abstract:
Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing n…
▽ More
Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are "scaled exponential linear units" (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, we prove that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero mean and unit variance -- even under the presence of noise and perturbations. This convergence property of SNNs allows to (1) train deep networks with many layers, (2) employ strong regularization, and (3) to make learning highly robust. Furthermore, for activations not close to unit variance, we prove an upper and lower bound on the variance, thus, vanishing and exploding gradients are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set. The winning SNN architectures are often very deep. Implementations are available at: github.com/bioinf-jku/SNNs.
△ Less
Submitted 7 September, 2017; v1 submitted 8 June, 2017;
originally announced June 2017.
-
An update on statistical boosting in biomedicine
Authors:
Andreas Mayr,
Benjamin Hofner,
Elisabeth Waldmann,
Tobias Hepp,
Olaf Gefeller,
Matthias Schmid
Abstract:
Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type o…
▽ More
Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
Probing for sparse and fast variable selection with model-based boosting
Authors:
Janek Thomas,
Tobias Hepp,
Andreas Mayr,
Bernd Bischl
Abstract:
We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g. cross-validation or bootstrap) to find the optimal number of boosting iterat…
▽ More
We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g. cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so called shadow variables, and stop the step-wise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional classification benchmark and apply it on gene expression data for the estimation of riboflavin production of Bacillus subtilis.
△ Less
Submitted 15 February, 2017;
originally announced February 2017.
-
Stability selection for component-wise gradient boosting in multiple dimensions
Authors:
Janek Thomas,
Andreas Mayr,
Bernd Bischl,
Matthias Schmid,
Adam Smith,
Benjamin Hofner
Abstract:
We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate (PFER). The model is fitted repeatedly to subsampled data and variables with high selection frequencies are extracted. To apply stability…
▽ More
We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate (PFER). The model is fitted repeatedly to subsampled data and variables with high selection frequencies are extracted. To apply stability selection to boosted GAMLSS, we develop a new "noncyclical" fitting algorithm that incorporates an additional selection step of the best-fitting distribution parameter in each iteration. This new algorithms has the additional advantage that optimizing the tuning parameters of boosting is reduced from a multi-dimensional to a one-dimensional problem with vastly decreased complexity. The performance of the novel algorithm is evaluated in an extensive simulation study. We apply this new algorithm to a study to estimate abundance of common eider in Massachusetts, USA, featuring excess zeros, overdispersion, non-linearity and spatio-temporal structures. Eider abundance is estimated via boosted GAMLSS, allowing both mean and overdispersion to be regressed on covariates. Stability selection is used to obtain a sparse set of stable predictors.
△ Less
Submitted 30 November, 2016;
originally announced November 2016.
-
Boosting Joint Models for Longitudinal and Time-to-Event Data
Authors:
Elisabeth Waldmann,
David Taylor-Robinson,
Nadja Klein,
Thomas Kneib,
Tania Pressler,
Matthias Schmid,
Andreas Mayr
Abstract:
Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach common a data structure in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by…
▽ More
Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach common a data structure in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modelling. Commonly, joint models are estimated in likelihood based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and which do not immediately work for high-dimensional data. In this paper, we propose a boosting algorithm tackling these challenges by being able to simultaneously estimate predictors for joint models and automatically select the most influential variables even in high-dimensional data situations. We analyse the performance of the new algorithm in a simulation study and apply it to the Danish cystic fibrosis registry which collects longitudinal lung function data on patients with cystic fibrosis together with data regarding the onset of pulmonary infections. This is the first approach to combine state-of-the art algorithms from the field of machine-learning with the model class of joint models, providing a fully data-driven mechanism to select variables and predictor effects in a unified framework of boosting joint models.
△ Less
Submitted 22 December, 2016; v1 submitted 9 September, 2016;
originally announced September 2016.
-
Signal Regression Models for Location, Scale and Shape with an Application to Stock Returns
Authors:
Sarah Brockhaus,
Andreas Fuest,
Andreas Mayr,
Sonja Greven
Abstract:
We discuss scalar-on-function regression models where all parameters of the assumed response distribution can be modeled depending on covariates. We thus combine signal regression models with generalized additive models for location, scale and shape (GAMLSS). We compare two fundamentally different methods for estimation, a gradient boosting and a penalized likelihood based approach, and address pr…
▽ More
We discuss scalar-on-function regression models where all parameters of the assumed response distribution can be modeled depending on covariates. We thus combine signal regression models with generalized additive models for location, scale and shape (GAMLSS). We compare two fundamentally different methods for estimation, a gradient boosting and a penalized likelihood based approach, and address practically important points like identifiability and model choice. Estimation by a component-wise gradient boosting algorithm allows for high dimensional data settings and variable selection. Estimation by a penalized likelihood based approach has the advantage of directly provided statistical inference. The motivating application is a time series of stock returns where it is of interest to model both the expectation and the variance depending on lagged response values and functional liquidity curves.
△ Less
Submitted 13 May, 2016;
originally announced May 2016.
-
Toxicity Prediction using Deep Learning
Authors:
Thomas Unterthiner,
Andreas Mayr,
Günter Klambauer,
Sepp Hochreiter
Abstract:
Everyday we are exposed to various chemicals via food additives, cleaning and cosmetic products and medicines -- and some of them might be toxic. However testing the toxicity of all existing compounds by biological experiments is neither financially nor logistically feasible. Therefore the government agencies NIH, EPA and FDA launched the Tox21 Data Challenge within the "Toxicology in the 21st Cen…
▽ More
Everyday we are exposed to various chemicals via food additives, cleaning and cosmetic products and medicines -- and some of them might be toxic. However testing the toxicity of all existing compounds by biological experiments is neither financially nor logistically feasible. Therefore the government agencies NIH, EPA and FDA launched the Tox21 Data Challenge within the "Toxicology in the 21st Century" (Tox21) initiative. The goal of this challenge was to assess the performance of computational methods in predicting the toxicity of chemical compounds. State of the art toxicity prediction methods build upon specifically-designed chemical descriptors developed over decades. Though Deep Learning is new to the field and was never applied to toxicity prediction before, it clearly outperformed all other participating methods. In this application paper we show that deep nets automatically learn features resembling well-established toxicophores. In total, our Deep Learning approach won both of the panel-challenges (nuclear receptors and stress response) as well as the overall Grand Challenge, and thereby sets a new standard in tox prediction.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.
-
Rectified Factor Networks
Authors:
Djork-Arné Clevert,
Andreas Mayr,
Thomas Unterthiner,
Sepp Hochreiter
Abstract:
We propose rectified factor networks (RFNs) to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the…
▽ More
We propose rectified factor networks (RFNs) to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces non-negative and normalized posterior means. We proof convergence and correctness of the RFN learning algorithm. On benchmarks, RFNs are compared to other unsupervised methods like autoencoders, RBMs, factor analysis, ICA, and PCA. In contrast to previous sparse coding methods, RFNs yield sparser codes, capture the data's covariance structure more precisely, and have a significantly smaller reconstruction error. We test RFNs as pretraining technique for deep networks on different vision datasets, where RFNs were superior to RBMs and autoencoders. On gene expression data from two pharmaceutical drug discovery studies, RFNs detected small and rare gene modules that revealed highly relevant new biological insights which were so far missed by other unsupervised methods.
△ Less
Submitted 11 June, 2015; v1 submitted 23 February, 2015;
originally announced February 2015.
-
gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework
Authors:
Benjamin Hofner,
Andreas Mayr,
Matthias Schmid
Abstract:
Generalized additive models for location, scale and shape (GAMLSS) are a flexible class of regression models that allow to model multiple parameters of a distribution function, such as the mean and the standard deviation, simultaneously. With the R package gamboostLSS, we provide a boosting method to fit these models. Variable selection and model choice are naturally available within this regulari…
▽ More
Generalized additive models for location, scale and shape (GAMLSS) are a flexible class of regression models that allow to model multiple parameters of a distribution function, such as the mean and the standard deviation, simultaneously. With the R package gamboostLSS, we provide a boosting method to fit these models. Variable selection and model choice are naturally available within this regularized regression framework. To introduce and illustrate the R package gamboostLSS and its infrastructure, we use a data set on stunted growth in India. In addition to the specification and application of the model itself, we present a variety of convenience functions, including methods for tuning parameter selection, prediction and visualization of results. The package gamboostLSS is available from CRAN (http://cran.r-project.org/package=gamboostLSS).
△ Less
Submitted 7 July, 2014;
originally announced July 2014.
-
Extending Statistical Boosting - An Overview of Recent Methodological Developments
Authors:
Andreas Mayr,
Harald Binder,
Olaf Gefeller,
Matthias Schmid
Abstract:
Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade. This review article aims to highlight recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research. We suggest a unified framework for gradient boosting…
▽ More
Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade. This review article aims to highlight recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research. We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed strictly separated in the literature up to now. Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any possible type of regression setting in combination with a large amount of different types of predictor effects. The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: (i) efforts to ensure variable selection leading to sparser models, (ii) developments regarding different types of predictor effects and their selection (model choice), (iii) approaches to extend the statistical boosting framework to new regression settings.
△ Less
Submitted 18 November, 2014; v1 submitted 7 March, 2014;
originally announced March 2014.
-
The Evolution of Boosting Algorithms - From Machine Learning to Statistical Modelling
Authors:
Andreas Mayr,
Harald Binder,
Olaf Gefeller,
Matthias Schmid
Abstract:
The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. This review article attempts to highlight this evolution of boosting algorithms from machine learning to statistical modell…
▽ More
The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. This review article attempts to highlight this evolution of boosting algorithms from machine learning to statistical modelling. We describe the AdaBoost algorithm for classification as well as the two most prominent statistical boosting approaches, gradient boosting and likelihood-based boosting. Although both appraoches are typically treated separately in the literature, they share the same methodological roots and follow the same fundamental concepts. Compared to the initial machine learning algorithms, which must be seen as black-box prediction schemes, statistical boosting result in statistical models which offer a straight-forward interpretation. We highlight the methodological background and present the most common software implementations. Worked out examples and corresponding R code can be found in the Appendix.
△ Less
Submitted 18 November, 2014; v1 submitted 6 March, 2014;
originally announced March 2014.
-
Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations
Authors:
Andreas Mayr,
Matthias Schmid
Abstract:
The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, e…
▽ More
The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.
△ Less
Submitted 25 October, 2013; v1 submitted 24 July, 2013;
originally announced July 2013.