-
A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random
Authors:
Binh H. Ho,
Long Nguyen Chi,
TrungTin Nguyen,
Binh T. Nguyen,
Van Ha Hoang,
Christopher Drovandi
Abstract:
Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define heterogeneous subgroups and handling data that are missing not at random, a prevalent issue in fields like transcriptomics. While several notable methods have…
▽ More
Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define heterogeneous subgroups and handling data that are missing not at random, a prevalent issue in fields like transcriptomics. While several notable methods have been proposed to address these problems, they typically tackle each issue in isolation, thereby limiting their flexibility and adaptability. This paper introduces a unified framework designed to address these challenges simultaneously. Our approach incorporates a data-driven penalty matrix into penalized clustering to enable more flexible variable selection, along with a mechanism that explicitly models the relationship between missingness and latent class membership. We demonstrate that, under certain regularity conditions, the proposed framework achieves both asymptotic consistency and selection consistency, even in the presence of missing data. This unified strategy significantly enhances the capability and efficiency of model-based clustering, advancing methodologies for identifying informative variables that define homogeneous subgroups in the presence of complex missing data patterns. The performance of the framework, including its computational efficiency, is evaluated through simulations and demonstrated using both synthetic and real-world transcriptomic datasets.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Functional Mixtures-of-Experts
Authors:
Faïcel Chamroukhi,
Nhat Thien Pham,
Van Hà Hoang,
Geoffrey J. McLachlan
Abstract:
We consider the statistical analysis of heterogeneous data for prediction in situations where the observations include functions, typically time series. We extend the modeling with Mixtures-of-Experts (ME), as a framework of choice in modeling heterogeneity in data for prediction with vectorial observations, to this functional data analysis context. We first present a new family of ME models, name…
▽ More
We consider the statistical analysis of heterogeneous data for prediction in situations where the observations include functions, typically time series. We extend the modeling with Mixtures-of-Experts (ME), as a framework of choice in modeling heterogeneity in data for prediction with vectorial observations, to this functional data analysis context. We first present a new family of ME models, named functional ME (FME) in which the predictors are potentially noisy observations, from entire functions. Furthermore, the data generating process of the predictor and the real response, is governed by a hidden discrete variable representing an unknown partition. Second, by imposing sparsity on derivatives of the underlying functional parameters via Lasso-like regularizations, we provide sparse and interpretable functional representations of the FME models called iFME. We develop dedicated expectation--maximization algorithms for Lasso-like (EM-Lasso) regularized maximum-likelihood parameter estimation strategies to fit the models. The proposed models and algorithms are studied in simulated scenarios and in applications to two real data sets, and the obtained results demonstrate their performance in accurately capturing complex nonlinear relationships and in clustering the heterogeneous regression data.
△ Less
Submitted 20 December, 2023; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Reconciling Bayesian and perimeter regularization for binary inversion
Authors:
Oliver R. A. Dunbar,
Matthew M. Dunlop,
Charles M. Elliott,
Viet Ha Hoang,
Andrew M. Stuart
Abstract:
A central theme in classical algorithms for the reconstruction of discontinuous functions from observational data is perimeter regularization via the use of the total variation. On the other hand, sparse or noisy data often demands a probabilistic approach to the reconstruction of images, to enable uncertainty quantification; the Bayesian approach to inversion, which itself introduces a form of re…
▽ More
A central theme in classical algorithms for the reconstruction of discontinuous functions from observational data is perimeter regularization via the use of the total variation. On the other hand, sparse or noisy data often demands a probabilistic approach to the reconstruction of images, to enable uncertainty quantification; the Bayesian approach to inversion, which itself introduces a form of regularization, is a natural framework in which to carry this out. In this paper the link between Bayesian inversion methods and perimeter regularization is explored. In this paper two links are studied: (i) the maximum a posteriori (MAP) objective function of a suitably chosen Bayesian phase-field approach is shown to be closely related to a least squares plus perimeter regularization objective; (ii) sample paths of a suitably chosen Bayesian level set formulation are shown to possess finite perimeter and to have the ability to learn about the true perimeter.
△ Less
Submitted 10 April, 2020; v1 submitted 6 June, 2017;
originally announced June 2017.