Skip to main content

Showing 1–29 of 29 results for author: Boulesteix, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.04864  [pdf, ps, other

    stat.ME

    Statistical parametric simulation studies based on real data

    Authors: Christina Sauer, F. Julian D. Lange, Maria Thurow, Ina Dormuth, Anne-Laure Boulesteix

    Abstract: Simulation studies are indispensable for evaluating and comparing statistical methods. The most common simulation approach is parametric simulation, where the data-generating mechanism (DGM) corresponds to a predefined parametric model from which observations are drawn. Many statistical simulation studies aim to provide practical recommendations on a method's suitability for a given application; h… ▽ More

    Submitted 2 June, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  2. arXiv:2503.08124  [pdf, other

    stat.ME

    On "confirmatory" methodological research in statistics and related fields

    Authors: F. J. D. Lange, Juliane C. Wilcke, Sabine Hoffmann, Moritz Herrmann, Anne-Laure Boulesteix

    Abstract: Empirical substantive research, such as in the life or social sciences, is commonly categorized into the two modes exploratory and confirmatory, both of which are essential to scientific progress. The former is also referred to as hypothesis-generating or data-contingent research, the latter is also called hypothesis-testing research. In the context of empirical methodological research in statisti… ▽ More

    Submitted 17 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  3. arXiv:2503.03484  [pdf, other

    stat.OT

    The impact of the storytelling fallacy on real data examples in methodological research

    Authors: Maximilian M. Mandl, Frank Weber, Tobias Wöhrle, Anne-Laure Boulesteix

    Abstract: The term "researcher degrees of freedom" (RDF), which was introduced in metascientific literature in the context of the replication crisis in science, refers to the extent of flexibility a scientist has in making decisions related to data analysis. These choices occur at all stages of the data analysis process. In combination with selective reporting, RDF may lead to over-optimistic statements and… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  4. arXiv:2502.14716  [pdf, other

    stat.ME

    Outlier Detection in Mendelian Randomisation

    Authors: Maximilian M Mandl, Anne-Laure Boulesteix, Stephen Burgess, Verena Zuber

    Abstract: Mendelian Randomisation (MR) uses genetic variants as instrumental variables to infer causal effects of exposures on an outcome. One key assumption of MR is that the genetic variants used as instrumental variables are independent of the outcome conditional on the risk factor and unobserved confounders. Violations of this assumption, i.e. the effect of the instrumental variables on the outcome thro… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  5. arXiv:2412.03491  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Beyond algorithm hyperparameters: on preprocessing hyperparameters and associated pitfalls in machine learning applications

    Authors: Christina Sauer, Anne-Laure Boulesteix, Luzia Hanßum, Farina Hodiamont, Claudia Bausewein, Theresa Ullmann

    Abstract: Adequately generating and evaluating prediction models based on supervised machine learning (ML) is often challenging, especially for less experienced users in applied research areas. Special attention is required in settings where the model generation process involves hyperparameter tuning, i.e. data-driven optimization of different types of hyperparameters to improve the predictive performance o… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  6. arXiv:2409.18836  [pdf, other

    stat.ML cs.LG

    Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study

    Authors: Hannah Schulz-Kümpel, Sebastian Fischer, Roman Hornung, Anne-Laure Boulesteix, Thomas Nagler, Bernd Bischl

    Abstract: When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-va… ▽ More

    Submitted 15 January, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

  7. arXiv:2408.11594  [pdf, ps, other

    stat.ME

    Rethinking the handling of method failure in comparison studies

    Authors: Milena Wünsch, Moritz Herrmann, Elisa Noltenius, Mattia Mohr, Tim P. Morris, Anne-Laure Boulesteix

    Abstract: Comparison studies in methodological research are intended to compare methods in an evidence-based manner to help data analysts select a suitable method for their application. To provide trustworthy evidence, they must be carefully designed, implemented, and reported, especially given the many decisions made in planning and running. A common challenge in comparison studies is to handle the "failur… ▽ More

    Submitted 4 July, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

  8. arXiv:2405.02200  [pdf, other

    cs.LG stat.ML

    Position: Why We Must Rethink Empirical Research in Machine Learning

    Authors: Moritz Herrmann, F. Julian D. Lange, Katharina Eggensperger, Giuseppe Casalicchio, Marcel Wever, Matthias Feurer, David Rügamer, Eyke Hüllermeier, Anne-Laure Boulesteix, Bernd Bischl

    Abstract: We warn against a common but incomplete understanding of empirical research in machine learning that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue… ▽ More

    Submitted 25 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 20 pages, accepted for publication at ICML 2024, camera-ready version

  9. arXiv:2402.18612  [pdf

    stat.ME cs.CY cs.LG

    Understanding overfitting in random forest for probability estimation: a visualization and simulation study

    Authors: Lasai Barreñada, Paula Dhiman, Dirk Timmerman, Anne-Laure Boulesteix, Ben Van Calster

    Abstract: Random forests have become popular for clinical risk prediction modelling. In a case study on predicting ovarian malignancy, we observed training c-statistics close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behaviour of random forests by (1) visualizing data space in three real world case studies and (2) a simulation study. For t… ▽ More

    Submitted 30 September, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 20 pages, 8 figures

    Journal ref: Diagn Progn Res 8, 14 (2024)

  10. arXiv:2402.00754  [pdf, other

    stat.AP

    To tweak or not to tweak. How exploiting flexibilities in gene set analysis leads to over-optimism

    Authors: Milena Wünsch, Christina Sauer, Moritz Herrmann, Ludwig Christian Hinske, Anne-Laure Boulesteix

    Abstract: Gene set analysis, a popular approach for analysing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibili… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  11. Addressing researcher degrees of freedom through minP adjustment

    Authors: Maximilian M Mandl, Andrea S Becker-Pennrich, Ludwig C Hinske, Sabine Hoffmann, Anne-Laure Boulesteix

    Abstract: When different researchers study the same research question using the same dataset they may obtain different and potentially even conflicting results. This is because there is often substantial flexibility in researchers' analytical choices, an issue also referred to as ''researcher degrees of freedom''. Combined with selective reporting of the smallest p-value or largest effect, researcher degree… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Journal ref: BMC Med Res Methodol 24, 152 (2024)

  12. arXiv:2310.15108  [pdf, other

    stat.ML cs.LG stat.AP stat.CO stat.ME

    Evaluating machine learning models in non-standard settings: An overview and new findings

    Authors: Roman Hornung, Malte Nalenz, Lennart Schneider, Andreas Bender, Ludwig Bothmann, Bernd Bischl, Thomas Augustin, Anne-Laure Boulesteix

    Abstract: Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines fo… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  13. arXiv:2308.15171  [pdf, other

    stat.AP

    From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis

    Authors: Milena Wünsch, Christina Sauer, Patrick Callahan, Ludwig Christian Hinske, Anne-Laure Boulesteix

    Abstract: Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods and corresponding tools have been developed for this task. However, clear guidance is lacking: choosing the right method is the first… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 52 pages, 4 figures

  14. arXiv:2302.03991  [pdf, other

    q-bio.GN cs.AI cs.LG stat.AP stat.CO

    Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study

    Authors: Roman Hornung, Frederik Ludwigs, Jonas Hagenberg, Anne-Laure Boulesteix

    Abstract: As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all obtained from the same patients. Such data lend themselves to being used as covariates in automatic outcome prediction because each omics type may contribute uni… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  15. arXiv:2301.11791  [pdf

    stat.CO

    Improving Software Engineering in Biostatistics: Challenges and Opportunities

    Authors: Daniel Sabanés Bové, Heidi Seibold, Anne-Laure Boulesteix, Juliane Manitz, Alessandro Gasparini, Burak K. Guünhan, Oliver Boix, Armin Schuüler, Sven Fillinger, Sven Nahnsen, Anna E. Jacob, Thomas Jaki

    Abstract: Programming is ubiquitous in applied biostatistics; adopting software engineering skills will help biostatisticians do a better job. To explain this, we start by highlighting key challenges for software development and application in biostatistics. Silos between different statistician roles, projects, departments, and organizations lead to the development of duplicate and suboptimal code. Building… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

  16. Phases of methodological research in biostatistics - building the evidence base for new methods

    Authors: Georg Heinze, Anne-Laure Boulesteix, Michael Kammer, Tim P. Morris, Ian R. White

    Abstract: Although the biostatistical scientific literature publishes new methods at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similarly to the well-known phases of clinical research in drug development, we define fou… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 14 pages

    Report number: 01 MSC Class: 62A01 (Primary)

  17. Explaining the optimistic performance evaluation of newly proposed methods: a cross-design validation experiment

    Authors: Christina Nießl, Sabine Hoffmann, Theresa Ullmann, Anne-Laure Boulesteix

    Abstract: The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call "cross-design validation of methods". In the… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Journal ref: Biometrical Journal 66(1) (2024), 2200238

  18. arXiv:2107.05847  [pdf, other

    stat.ML cs.LG

    Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

    Authors: Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, Difan Deng, Marius Lindauer

    Abstract: Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for superv… ▽ More

    Submitted 24 November, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

  19. Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results

    Authors: Christina Nießl, Moritz Herrmann, Chiara Wiedemann, Giuseppe Casalicchio, Anne-Laure Boulesteix

    Abstract: In recent years, the need for neutral benchmark studies that focus on the comparison of methods from computational sciences has been increasingly recognised by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, certain amounts of flexibility always exist. This includes the choice of data sets and performance mea… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 39 pages

    Journal ref: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12(2) (2022), e1441

  20. arXiv:2103.01281  [pdf, ps, other

    stat.ME

    Validation of cluster analysis results on validation data: A systematic framework

    Authors: Theresa Ullmann, Christian Hennig, Anne-Laure Boulesteix

    Abstract: Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To judge the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to valid… ▽ More

    Submitted 10 January, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: 32 pages, 1 figure

  21. arXiv:2003.03621  [pdf, ps, other

    stat.ML cs.LG stat.AP stat.ME

    Large-scale benchmark study of survival prediction methods using multi-omics data

    Authors: Moritz Herrmann, Philipp Probst, Roman Hornung, Vindi Jurinovic, Anne-Laure Boulesteix

    Abstract: Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables (often in addition to classical clinical variables), are increasingly generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

    Comments: 23 pages, 6 tables, 3 figures

    Journal ref: Briefings in Bioinformatics (2020) bbaa167

  22. arXiv:1812.00661  [pdf

    q-bio.QM stat.AP

    Essential guidelines for computational method benchmarking

    Authors: Lukas M. Weber, Wouter Saelens, Robrecht Cannoodt, Charlotte Soneson, Alexander Hapfelmeier, Paul P. Gardner, Anne-Laure Boulesteix, Yvan Saeys, Mark D. Robinson

    Abstract: In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods f… ▽ More

    Submitted 3 June, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

    Comments: Minor updates

  23. Benchmarking in cluster analysis: A white paper

    Authors: Iven Van Mechelen, Anne-Laure Boulesteix, Rainer Dangl, Nema Dean, Isabelle Guyon, Christian Hennig, Friedrich Leisch, Douglas Steinley

    Abstract: Note: A revised version of this is now published. Please cite and read (it's open access): Van Mechelen, I., Boulesteix, A.-L., Dangl, R., Dean, N., Hennig, C., Leisch, F., Steinley, D., Warrens, M. J. (2023). A white paper on good research practices in benchmarking: The case of cluster analysis. WIREs Data Mining and Knowledge Discovery, e1511. https://doi.org/10.1002/widm.1511 To achieve scien… ▽ More

    Submitted 30 July, 2023; v1 submitted 27 September, 2018; originally announced September 2018.

    MSC Class: 62H30

    Journal ref: WIREs Data Mining and Knowledge Discovery, 2023, e1511

  24. arXiv:1804.03515  [pdf, other

    stat.ML cs.LG

    Hyperparameters and Tuning Strategies for Random Forest

    Authors: Philipp Probst, Marvin Wright, Anne-Laure Boulesteix

    Abstract: The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain and the number of trees. In this paper, we first provide a… ▽ More

    Submitted 26 February, 2019; v1 submitted 10 April, 2018; originally announced April 2018.

    Comments: 19 pages, 2 figures

    Journal ref: WIREs Data Mining Knowl Discov 2019

  25. arXiv:1802.09596  [pdf, other

    stat.ML

    Tunability: Importance of Hyperparameters of Machine Learning Algorithms

    Authors: Philipp Probst, Bernd Bischl, Anne-Laure Boulesteix

    Abstract: Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistic… ▽ More

    Submitted 22 October, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

    Comments: 22 pages, 10 tables, 8 figures

  26. arXiv:1705.05654  [pdf, other

    stat.ML cs.LG

    To tune or not to tune the number of trees in random forest?

    Authors: Philipp Probst, Anne-Laure Boulesteix

    Abstract: The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimu… ▽ More

    Submitted 16 May, 2017; originally announced May 2017.

    Comments: 20 pages, 4 figures

    Journal ref: Journal of Machine Learning Research 18 (2018) 1-18

  27. arXiv:1310.8203  [pdf, ps, other

    math.ST stat.ML

    A U-statistic estimator for the variance of resampling-based error estimators

    Authors: Mathias Fuchs, Roman Hornung, Riccardo De Bin, Anne-Laure Boulesteix

    Abstract: We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Thus, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is an unbiased estimator for this minimal variance if… ▽ More

    Submitted 18 December, 2013; v1 submitted 30 October, 2013; originally announced October 2013.

    Comments: 15 pages, no figures

    MSC Class: Primary: 62 G 09; secondary: 62 G 10; 62 H 15; 62 E 20

  28. arXiv:1208.2651  [pdf, ps, other

    stat.CO cs.CV stat.ME stat.ML

    A Plea for Neutral Comparison Studies in Computational Sciences

    Authors: Anne-Laure Boulesteix, Manuel J. A. Eugster

    Abstract: In a context where most published articles are devoted to the development of "new methods", comparison studies are generally appreciated by readers but surprisingly given poor consideration by many scientific journals. In connection with recent articles on over-optimism and epistemology published in Bioinformatics, this letter stresses the importance of neutral comparison studies for the objective… ▽ More

    Submitted 13 August, 2012; originally announced August 2012.

  29. arXiv:0905.0603  [pdf, other

    stat.ME stat.AP stat.CO

    Regularized estimation of large-scale gene association networks using graphical Gaussian models

    Authors: Nicole Kraemer, Juliane Schaefer, Anne-Laure Boulesteix

    Abstract: Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of samples is the estimation of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the sample covariance matrix leads to poor estimates in this scenario, standard methods are inappr… ▽ More

    Submitted 30 August, 2009; v1 submitted 5 May, 2009; originally announced May 2009.

    Comments: added additional experiments

    Journal ref: BMC Bioinformatics, 10:384, 2010