Skip to main content

Showing 1–12 of 12 results for author: de Rooij, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.19865  [pdf, other

    stat.ME stat.CO

    Reduced Rank Regression for Mixed Predictor and Response Variables

    Authors: Mark de Rooij, Lorenza Cotugno, Roberta Siciliano

    Abstract: In this paper, we propose the generalized mixed reduced rank regression method, GMR$^3$ for short. GMR$^3$ is a regression method for a mix of numeric, binary, and ordinal response variables. The predictor variables can be a mix of binary, nominal, ordinal, and numeric variables. For dealing with the categorical predictors we use optimal scaling. A majorization-minimization algorithm is derived fo… ▽ More

    Submitted 22 January, 2025; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 29 pages, 4 figures

  2. arXiv:2402.07634  [pdf, other

    stat.ME stat.CO

    A Multinomial Canonical Decomposition Model, with emphasis on the analysis of Multivariate Binary data

    Authors: Mark de Rooij

    Abstract: In this paper, we propose to decompose the canonical parameter of a multinomial model into a set of participant scores and category scores. External information about the participants or the categories can be used to restrict these scores. Therefore, we impose the constraint that the scores are linear combinations of the external variables. For the estimation of the parameters of the decomposition… ▽ More

    Submitted 22 January, 2025; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: 28 pages, 0 figures

  3. arXiv:2402.07629  [pdf, other

    stat.ME stat.CO

    Logistic Multidimensional Data Analysis for Ordinal Response Variables using a Cumulative Link function

    Authors: Mark de Rooij, Ligaya Breemer, Dion Woestenburg, Frank Busing

    Abstract: We present a multidimensional data analysis framework for the analysis of ordinal response variables. Underlying the ordinal variables, we assume a continuous latent variable, leading to cumulative logit models. The framework includes unsupervised methods, when no predictor variables are available, and supervised methods, when predictor variables are available. We distinguish between dominance var… ▽ More

    Submitted 22 January, 2025; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: 56 pages, 10 figures

  4. Supervised and Unsupervised Mapping of Binary Variables: A proximity perspective

    Authors: Mark de Rooij, Dion Woestenburg, Frank Busing

    Abstract: We propose a new mapping tool for supervised and unsupervised analysis of multivariate binary data with multiple items, questions, or response variables. The mapping assumes an underlying proximity response function, where participants can have multiple reasons to disagree or say ``no'' to a question. The probability to endorse, or to agree with an item depends on an item specific parameter and th… ▽ More

    Submitted 22 January, 2025; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: 38 pages, 11 figures

  5. arXiv:2308.08387  [pdf, other

    stat.ML cs.LG

    Continuous Sweep for Binary Quantification Learning

    Authors: Kevin Kloos, Julian D. Karch, Quinten A. Meertens, Mark de Rooij

    Abstract: A quantifier is a supervised machine learning algorithm, focused on estimating the class prevalence in a dataset rather than labeling its individual observations. We introduce Continuous Sweep, a new parametric binary quantifier inspired by the well-performing Median Sweep, which is an ensemble method based on Adjusted Count estimators. We modified two aspects of Median Sweep: 1) using parametric… ▽ More

    Submitted 11 October, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    MSC Class: 68U99

  6. arXiv:2210.14484  [pdf, other

    stat.ML cs.LG stat.ME

    Imputation of missing values in multi-view data

    Authors: Wouter van Loon, Marjolein Fokkema, Frank de Vos, Marisa Koini, Reinhold Schmidt, Mark de Rooij

    Abstract: Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation met… ▽ More

    Submitted 20 June, 2024; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 49 pages, 15 figures. Accepted manuscript

    Journal ref: Information Fusion 111 (2024) 102524

  7. arXiv:2108.05761  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer's disease classification

    Authors: Wouter van Loon, Frank de Vos, Marjolein Fokkema, Botond Szabo, Marisa Koini, Reinhold Schmidt, Mark de Rooij

    Abstract: Multi-view data refers to a setting where features are divided into feature sets, for example because they correspond to different sources. Stacked penalized logistic regression (StaPLR) is a recently introduced method that can be used for classification and automatically selecting the views that are most important for prediction. We introduce an extension of this method to a setting where the dat… ▽ More

    Submitted 26 April, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: 36 pages, 9 figures. Accepted manuscript

    Journal ref: Frontiers in Neuroscience 16:830630 (2022) 1-15

  8. arXiv:2107.13920  [pdf, other

    stat.ME

    The Bradly-Terry Regression Trunk approach for modelling preference data with small trees

    Authors: Alessio Baldassarre, Elise Dusseldorp, Antonio D'Ambrosio, Mark de Rooij, Claudio Conversano

    Abstract: This paper introduces the Bradley-Terry Regression Trunk model, a novel probabilistic approach for the analysis of preference data expressed through paired comparison rankings. In some cases, it may be reasonable to assume that the preferences expressed by individuals depend on their characteristics. Within the framework of tree-based partitioning, we specify a tree-based model estimating the join… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

  9. arXiv:2102.08232  [pdf, other

    stat.ME stat.CO stat.ML

    The MELODIC family for simultaneous binary logistic regression in a reduced space

    Authors: Mark de Rooij, Patrick J. F. Groenen

    Abstract: Logistic regression is a commonly used method for binary classification. Researchers often have more than a single binary response variable and simultaneous analysis is beneficial because it provides insight into the dependencies among response variables as well as between the predictor variables and the responses. Moreover, in such a simultaneous analysis the equations can lend each other strengt… ▽ More

    Submitted 24 June, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: Comment [v2]: added a paragraph on page 7 about the equivalence to a logistic reduced rank model Comment [v2]: the description of the relationship towards logistic reduced rank models is updated on page 37

  10. arXiv:2010.16271  [pdf, other

    stat.ML cs.LG stat.ME

    View selection in multi-view stacking: Choosing the meta-learner

    Authors: Wouter van Loon, Marjolein Fokkema, Botond Szabo, Mark de Rooij

    Abstract: Multi-view stacking is a framework for combining information from different views (i.e. different feature sets) describing the same set of objects. In this framework, a base-learner algorithm is trained on each view separately, and their predictions are then combined by a meta-learner algorithm. In a previous study, stacked penalized logistic regression, a special case of multi-view stacking, has… ▽ More

    Submitted 15 April, 2024; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: 47 pages, 17 figures. Accepted manuscript

    MSC Class: 62; 68

    Journal ref: Advances in Data Analysis and Classification (2024)

  11. arXiv:1911.11463  [pdf, other

    stat.ME

    The Early Roots of Statistical Learning in the Psychometric Literature: A review and two new results

    Authors: Mark de Rooij, Bunga Citra Pratiwi, Marjolein Fokkema, Elise Dusseldorp, Henk Kelderman

    Abstract: Machine and Statistical learning techniques become more and more important for the analysis of psychological data. Four core concepts of machine learning are the bias variance trade-off, cross-validation, regularization, and basis expansion. We present some early psychometric papers, from almost a century ago, that dealt with cross-validation and regularization. From this review it is safe to conc… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: 22 pages, 3 figures

  12. arXiv:1811.02316  [pdf, other

    stat.ML cs.LG stat.ME

    Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning

    Authors: Wouter van Loon, Marjolein Fokkema, Botond Szabo, Mark de Rooij

    Abstract: In biomedical research, many different types of patient data can be collected, such as various types of omics data and medical imaging modalities. Applying multi-view learning to these different sources of information can increase the accuracy of medical classification models compared with single-view procedures. However, collecting biomedical data can be expensive and/or burdening for patients, s… ▽ More

    Submitted 12 May, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: 26 pages, 9 figures. Accepted manuscript

    Journal ref: Information Fusion 61 (2020) 113-123