Skip to main content

Showing 1–27 of 27 results for author: Schmid, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.16981  [pdf, other

    stat.ME

    A categorization of performance measures for estimated non-linear associations between an outcome and continuous predictors

    Authors: Theresa Ullmann, Georg Heinze, Michal Abrahamowicz, Aris Perperoglou, Willi Sauerbrei, Matthias Schmid, Daniela Dunkler, for TG2 of the STRATOS initiative

    Abstract: In regression analysis, associations between continuous predictors and the outcome are often assumed to be linear. However, modeling the associations as non-linear can improve model fit. Many flexible modeling techniques, like (fractional) polynomials and spline-based approaches, are available. Such methods can be systematically compared in simulation studies, which require suitable performance me… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  2. arXiv:2501.12787  [pdf, other

    stat.ME

    Flexible tree-structured regression for clustered data with an application to quality of life in older adults

    Authors: Nikolai Spuck, Matthias Schmid, Moritz Berger

    Abstract: Tree-structured models are a powerful alternative to parametric regression models if non-linear effects and interactions are present in the data. Yet, classical tree-structured models might not be appropriate if data comes in clusters of units, which requires taking the dependence of observations into account. This is, for example, the case in cross-national studies, as presented here, where count… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    MSC Class: 62J02; 62P25

  3. arXiv:2411.01381  [pdf, other

    stat.ME

    Modeling the restricted mean survival time using pseudo-value random forests

    Authors: Alina Schenk, Vanessa Basten, Matthias Schmid

    Abstract: The restricted mean survival time (RMST) has become a popular measure to summarize event times in longitudinal studies. Defined as the area under the survival function up to a time horizon $τ$ > 0, the RMST can be interpreted as the life expectancy within the time interval [0, $τ$]. In addition to its straightforward interpretation, the RMST also allows for the definition of valid estimands for th… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

  4. arXiv:2407.18650  [pdf, other

    stat.ML cs.LG

    Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects

    Authors: David Köhler, David Rügamer, Matthias Schmid

    Abstract: Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  5. arXiv:2406.19887  [pdf, other

    stat.ME

    Confidence intervals for tree-structured varying coefficients

    Authors: Nikolai Spuck, Matthias Schmid, Malte Monin, Moritz Berger

    Abstract: The tree-structured varying coefficient model (TSVC) is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, we propose a procedure to construct confidence intervals of the estimated partition-specific… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  6. arXiv:2312.00439  [pdf, other

    stat.ME

    Modeling the Ratio of Correlated Biomarkers Using Copula Regression

    Authors: Moritz Berger, Nadja Klein, Michael Wagner, Matthias Schmid

    Abstract: Modeling the ratio of two dependent components as a function of covariates is a frequently pursued objective in observational research. Despite the high relevance of this topic in medical studies, where biomarker ratios are often used as surrogate endpoints for specific diseases, existing models are based on oversimplified assumptions, assuming e.g.\@ independence or strictly positive associations… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 32 pages, 6 figures, 5 tables

  7. arXiv:2310.20409  [pdf, other

    stat.ME

    Detection of nonlinearity, discontinuity and interactions in generalized regression models

    Authors: Nikolai Spuck, Matthias Schmid, Moritz Berger

    Abstract: In generalized regression models the effect of continuous covariates is commonly assumed to be linear. This assumption, however, may be too restrictive in applications and may lead to biased effect estimates and decreased predictive ability. While a multitude of alternatives for the flexible modeling of continuous covariates have been proposed, methods that provide guidance for choosing a suitable… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 30 pages, 8 figures

  8. arXiv:2212.01613  [pdf, other

    stat.ME

    Accounting for Time Dependency in Meta-Analyses of Concordance Probability Estimates

    Authors: Matthias Schmid, Tim Friede, Nadja Klein, Leonie Weinhold

    Abstract: Recent years have seen the development of many novel scoring tools for disease prognosis and prediction. To become accepted for use in clinical applications, these tools have to be validated on external data. In practice, validation is often hampered by logistical issues, resulting in multiple small-sized validation studies. It is therefore necessary to synthesize the results of these studies usin… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

  9. arXiv:2210.12157  [pdf, other

    cs.CV cs.RO stat.AP

    Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares

    Authors: Saeed Maleki, John Crassidis, Yang Cheng, Matthias Schmid

    Abstract: This study presents a theoretical structure for the monocular pose estimation problem using the total least squares. The unit-vector line-of-sight observations of the features are extracted from the monocular camera images. First, the optimization framework is formulated for the pose estimation problem with observation vectors extracted from unit vectors from the camera center-of-projection, point… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2106.11522. text overlap with arXiv:2210.11697

  10. arXiv:2210.11697  [pdf, other

    cs.RO stat.AP

    Optimal Pose Estimation and Covariance Analysis with Simultaneous Localization and Mapping Applications

    Authors: Saeed Maleki, Adhiti Raman, Yang Cheng, John Crassidis, Matthias Schmid

    Abstract: This work provides a theoretical analysis for optimally solving the pose estimation problem using total least squares for vector observations from landmark features, which is central to applications involving simultaneous localization and mapping. First, the optimization process is formulated with observation vectors extracted from point-cloud features. Then, error-covariance expressions are deriv… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.11522

  11. arXiv:2209.06592  [pdf, other

    stat.ME stat.ML

    Model-based recursive partitioning for discrete event times

    Authors: Cynthia Huber, Matthias Schmid, Tim Friede

    Abstract: Model-based recursive partitioning (MOB) is a semi-parametric statistical approach allowing the identification of subgroups that can be combined with a broad range of outcome measures including continuous time-to-event outcomes. When time is measured on a discrete scale, methods and models need to account for this discreetness as otherwise subgroups might be spurious and effects biased. The test u… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

  12. arXiv:2004.09677  [pdf, other

    cs.LG stat.ML

    Approximate exploitability: Learning a best response in large games

    Authors: Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling

    Abstract: Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically… ▽ More

    Submitted 3 November, 2022; v1 submitted 20 April, 2020; originally announced April 2020.

  13. arXiv:2001.11240  [pdf, other

    stat.ME

    Assessing the Calibration of Subdistribution Hazard Models in Discrete Time

    Authors: Moritz Berger, Matthias Schmid

    Abstract: The generalization performance of a risk prediction model can be evaluated by its calibration, which measures the agreement between predicted and observed outcomes on external validation data. Here, methods for assessing the calibration of discrete time-to-event models in the presence of competing risks are proposed. The methods are designed for the class of discrete subdistribution hazard models,… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: 23 pages, 5 figures

  14. arXiv:1907.00786  [pdf

    stat.ME

    State-of-the-art in selection of variables and functional forms in multivariable analysis -- outstanding issues

    Authors: Willi Sauerbrei, Aris Perperoglou, Matthias Schmid, Michal Abrahamowicz, Heiko Becher, Harald Binder, Daniela Dunkler, Frank E. Harrell Jr, Patrick Royston, Georg Heinze

    Abstract: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  15. arXiv:1901.06211  [pdf, other

    stat.ME

    A Random Forest Approach for Modeling Bounded Outcomes

    Authors: Leonie Weinhold, Matthias Schmid, Marvin N. Wright, Moritz Berger

    Abstract: Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the unit interval, however, classical random forest approaches may severely suffer as they do not account for the heteroscedasticity in the data. A random forest appr… ▽ More

    Submitted 18 January, 2019; originally announced January 2019.

    Comments: 19 pages, 5 figures

  16. arXiv:1802.08178  [pdf, other

    stat.ME

    Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection

    Authors: Thomas Welchowski, Verena Zuber, Matthias Schmid

    Abstract: Background: The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis it is often necessary to discriminate between influential and non-influential markers. Usually, the first step is to perform a univariate screening step that ranks the markers according to their associations with the outcome.… ▽ More

    Submitted 24 February, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

  17. arXiv:1705.08699  [pdf, other

    stat.ME

    Tree-Structured Modelling of Varying Coefficients

    Authors: Moritz Berger, Gerhard Tutz, Matthias Schmid

    Abstract: The varying-coefficient model is a strong tool for the modelling of interactions in generalized regression. It is easy to apply if both the variables that are modified as well as the effect modifiers are known. However, in general one has a set of explanatory variables and it is unknown which variables are modified by which covariates. A recursive partitioning strategy is proposed that is able to… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

    Comments: 20 pages, 4 figures, 10 tables

  18. arXiv:1704.04087  [pdf, other

    stat.AP

    Semiparametric Regression for Discrete Time-to-Event Data

    Authors: Moritz Berger, Matthias Schmid

    Abstract: Time-to-event models are a popular tool to analyse data where the outcome variable is the time to the occurrence of a specific event of interest. Here we focus on the analysis of time-to-event outcomes that are either intrisically discrete or grouped versions of continuous event times. In the literature, there exists a variety of regression methods for such data. This tutorial provides an introduc… ▽ More

    Submitted 13 April, 2017; originally announced April 2017.

    Comments: 35 pages, 2 tables, 6 figures

  19. arXiv:1702.08185  [pdf, ps, other

    stat.AP stat.CO stat.ML

    An update on statistical boosting in biomedicine

    Authors: Andreas Mayr, Benjamin Hofner, Elisabeth Waldmann, Tobias Hepp, Olaf Gefeller, Matthias Schmid

    Abstract: Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type o… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

  20. Stability selection for component-wise gradient boosting in multiple dimensions

    Authors: Janek Thomas, Andreas Mayr, Bernd Bischl, Matthias Schmid, Adam Smith, Benjamin Hofner

    Abstract: We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate (PFER). The model is fitted repeatedly to subsampled data and variables with high selection frequencies are extracted. To apply stability… ▽ More

    Submitted 30 November, 2016; originally announced November 2016.

    Comments: 16 pages

  21. arXiv:1609.02686  [pdf, other

    stat.ML stat.ME

    Boosting Joint Models for Longitudinal and Time-to-Event Data

    Authors: Elisabeth Waldmann, David Taylor-Robinson, Nadja Klein, Thomas Kneib, Tania Pressler, Matthias Schmid, Andreas Mayr

    Abstract: Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach common a data structure in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by… ▽ More

    Submitted 22 December, 2016; v1 submitted 9 September, 2016; originally announced September 2016.

  22. arXiv:1607.07028  [pdf, ps, other

    stat.ME

    A Statistical Model for the Analysis of Beta Values in DNA Methylation Studies

    Authors: Leonie Weinhold, Simone Wahl, Matthias Schmid

    Abstract: Background: The analysis of DNA methylation is a key component in the development of personalized treatment approaches. A common way to measure DNA methylation is the calculation of beta values, which are bounded variables of the form M = (M + U) that are generated by Illumina's 450k BeadChip array. The statistical analysis of beta values is considered to be challenging, as traditional methods for… ▽ More

    Submitted 24 July, 2016; originally announced July 2016.

  23. arXiv:1507.03092  [pdf, other

    stat.ML

    On the use of Harrell's C for clinical risk prediction via random survival forests

    Authors: Matthias Schmid, Marvin Wright, Andreas Ziegler

    Abstract: Random survival forests (RSF) are a powerful method for risk prediction of right-censored outcomes in biomedical research. RSF use the log-rank split criterion to form an ensemble of survival trees. The most common approach to evaluate the prediction accuracy of a RSF model is Harrell's concordance index for survival data ('C index'). Conceptually, this strategy implies that the split criterion in… ▽ More

    Submitted 18 July, 2016; v1 submitted 11 July, 2015; originally announced July 2015.

  24. arXiv:1407.1774  [pdf, other

    stat.CO

    gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework

    Authors: Benjamin Hofner, Andreas Mayr, Matthias Schmid

    Abstract: Generalized additive models for location, scale and shape (GAMLSS) are a flexible class of regression models that allow to model multiple parameters of a distribution function, such as the mean and the standard deviation, simultaneously. With the R package gamboostLSS, we provide a boosting method to fit these models. Variable selection and model choice are naturally available within this regulari… ▽ More

    Submitted 7 July, 2014; originally announced July 2014.

  25. Extending Statistical Boosting - An Overview of Recent Methodological Developments

    Authors: Andreas Mayr, Harald Binder, Olaf Gefeller, Matthias Schmid

    Abstract: Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade. This review article aims to highlight recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research. We suggest a unified framework for gradient boosting… ▽ More

    Submitted 18 November, 2014; v1 submitted 7 March, 2014; originally announced March 2014.

    Journal ref: Methods Inf Med 2014; 53(6): 428-435

  26. The Evolution of Boosting Algorithms - From Machine Learning to Statistical Modelling

    Authors: Andreas Mayr, Harald Binder, Olaf Gefeller, Matthias Schmid

    Abstract: The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. This review article attempts to highlight this evolution of boosting algorithms from machine learning to statistical modell… ▽ More

    Submitted 18 November, 2014; v1 submitted 6 March, 2014; originally announced March 2014.

    Journal ref: Methods Inf Med 2014; 53(6): 419-427

  27. arXiv:1307.6417  [pdf, ps, other

    stat.AP stat.ME stat.ML

    Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

    Authors: Andreas Mayr, Matthias Schmid

    Abstract: The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, e… ▽ More

    Submitted 25 October, 2013; v1 submitted 24 July, 2013; originally announced July 2013.

    Comments: revised manuscript - added simulation study, additional results

    Journal ref: PloS ONE 2014, 9(1): e84483