Search | arXiv e-print repository

arXiv:2005.06848 [pdf, ps, other]

Multi-Node EM Algorithm for Finite Mixture Models

Authors: Sharon X. Lee, Geoffrey J. McLachlan, Kaleb L. Leemaqz

Abstract: Finite mixture models are powerful tools for modelling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation-Maximization (EM) algorithm. Recently, the adoption of flexible distributions as component densities has become increasingly popular. Often, the EM algorithm for these models involves complicated expressions t… ▽ More Finite mixture models are powerful tools for modelling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation-Maximization (EM) algorithm. Recently, the adoption of flexible distributions as component densities has become increasingly popular. Often, the EM algorithm for these models involves complicated expressions that are time-consuming to evaluate numerically. In this paper, we describe a parallel implementation of the EM-algorithm suitable for both single-threaded and multi-threaded processors and for both single machine and multiple-node systems. Numerical experiments are performed to demonstrate the potential performance gain n different settings. Comparison is also made across two commonly used platforms - R and MATLAB. For illustration, a fairly general mixture model is used in the comparison. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: 12 Pages,1 figure

arXiv:1904.12057 [pdf, ps, other]

Comment on "Hidden truncation hyperbolic distributions, finite mixtures thereof and their application for clustering" Murray, Browne, and \McNicholas

Authors: Geoffrey J. McLachlan, Sharon X. Lee

Abstract: We comment on the paper of Murray, Browne, and McNicholas (2017), who proposed mixtures of skew distributions, which they termed hidden truncation hyperbolic (HTH). They recently made a clarification (Murray, Browne, McNicholas, 2019) concerning their claim that the so-called CFUST distribution is a special case of the HTH distribution. There are also some other matters in the original version of… ▽ More We comment on the paper of Murray, Browne, and McNicholas (2017), who proposed mixtures of skew distributions, which they termed hidden truncation hyperbolic (HTH). They recently made a clarification (Murray, Browne, McNicholas, 2019) concerning their claim that the so-called CFUST distribution is a special case of the HTH distribution. There are also some other matters in the original version of the paper that were in need of clarification as discussed here. △ Less

Submitted 26 April, 2019; originally announced April 2019.

Comments: 7 pages

arXiv:1810.04842 [pdf, ps, other]

On formulations of skew factor models: skew errors versus skew factors

Authors: Sharon X. Lee, Geoffrey J. McLachlan

Abstract: In the past few years, there have been a number of proposals for generalizing the factor analysis (FA) model and its mixture version (known as mixtures of factor analyzers (MFA)) using non-normal and asymmetric distributions. These models adopt various types of skew densities for either the factors or the errors. While the relationships between various choices of skew distributions have been discu… ▽ More In the past few years, there have been a number of proposals for generalizing the factor analysis (FA) model and its mixture version (known as mixtures of factor analyzers (MFA)) using non-normal and asymmetric distributions. These models adopt various types of skew densities for either the factors or the errors. While the relationships between various choices of skew distributions have been discussed in the literature, the differences between placing the assumption of skewness on the factors or on the errors have not been closely studied. This paper examines these formulations and discusses the connections between these two types of formulations for skew factor models. In doing so, we introduce a further formulation that unifies these two formulations; that is, placing a skew distribution on both the factors and the errors. △ Less

Submitted 20 November, 2018; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1802.02467 [pdf, other]

Mixtures of Factor Analyzers with Fundamental Skew Symmetric Distributions

Authors: Sharon X. Lee, Tsung-I Lin, Geoffrey J. McLachlan

Abstract: Mixtures of factor analyzers (MFA) provide a powerful tool for modelling high-dimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors was relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the distribution of the factors/errors in most of these m… ▽ More Mixtures of factor analyzers (MFA) provide a powerful tool for modelling high-dimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors was relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the distribution of the factors/errors in most of these models is typically limited to modelling skewness oncentrated in a single direction. Here, we introduce a more flexible finite mixture of factor analyzers based on the class of scale mixtures of canonical fundamental skew normal (SMCFUSN) distributions. This very general class of skew distributions can capture various types of skewness and asymmetry in the data. In particular, the proposed mixture model of SMCFUSN factor analyzers(SMCFUSNFA) can simultaneously accommodate multiple directions of skewness. As such, it encapsulates many commonly used models as special and/or limiting cases, such as models of some versions of skew normal and skew t-factor analyzers, and skew hyperbolic factor analyzers. For illustration, we focus on the t-distribution member of the class of SMCFUSN distributions, leading to mixtures of canonical fundamental skew t-factor analyzers (CFUSTFA). Parameter estimation can be carried out by maximum likelihood via an EM-type algorithm. The usefulness and potential of the proposed model are demonstrated using two real datasets. △ Less

Submitted 26 October, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

arXiv:1608.02797 [pdf, other]

A block EM algorithm for multivariate skew normal and skew t-mixture models

Authors: Sharon X Lee, Kaleb L Leemaqz, Geoffrey J McLachlan

Abstract: Finite mixtures of skew distributions provide a flexible tool for modelling heterogeneous data with asymmetric distributional features. However, parameter estimation via the Expectation-Maximization (EM) algorithm can become very time-consuming due to the complicated expressions involved in the E-step that are numerically expensive to evaluate. A more time-efficient implementation of the EM algori… ▽ More Finite mixtures of skew distributions provide a flexible tool for modelling heterogeneous data with asymmetric distributional features. However, parameter estimation via the Expectation-Maximization (EM) algorithm can become very time-consuming due to the complicated expressions involved in the E-step that are numerically expensive to evaluate. A more time-efficient implementation of the EM algorithm was recently proposed which allows each component of the mixture model to be evaluated in parallel. In this paper, we develop a block implementation of the EM algorithm that facilitates the calculations in the E- and M-steps to be spread across a larger number of threads. We focus on the fitting of finite mixtures of multivariate skew normal and skew t-distributions, and show that both the E- and M-steps in the EM algorithm can be modified to allow the data to be split into blocks. The approach can be easily implemented for use by multicore and multi-processor machines. It can also be applied concurrently with the recently proposed multithreaded EM algorithm to achieve further reduction in computation time. The improvement in time performance is illustrated on some real datasets. △ Less

Submitted 9 August, 2016; originally announced August 2016.

arXiv:1606.02054 [pdf, other]

A simple multithreaded implementation of the EM algorithm for mixture models

Authors: Sharon X Lee, Kaleb L Lee, Geoffrey J McLachlan

Abstract: Finite mixture models have been widely used for the modelling and analysis of data from heterogeneous populations. Maximum likelihood estimation of the parameters is typically carried out via the Expectation-Maximization (EM) algorithm. The complexity of the implementation of the algorithm depends on the parametric distribution that is adopted as the component densities of the mixture model. In th… ▽ More Finite mixture models have been widely used for the modelling and analysis of data from heterogeneous populations. Maximum likelihood estimation of the parameters is typically carried out via the Expectation-Maximization (EM) algorithm. The complexity of the implementation of the algorithm depends on the parametric distribution that is adopted as the component densities of the mixture model. In the case of the skew normal and skew t-distributions, for example, the E-step would involve complicated expressions that are computationally expensive to evaluate. This can become quite time-consuming for large and/or high-dimensional datasets. In this paper, we develop a multithreaded version of the EM algorithm for the fitting of finite mixture models. Due to the structure of the algorithm for these models, the E- and M-steps can be easily reformulated to be executed in parallel across multiple threads to take advantage of the processing power available in modern-day multicore machines. Our approach is simple and easy to implement, requiring only small changes to standard code. To illustrate the approach, we focus on a fairly general mixture model that includes as special or limiting cases some of the most commonly used mixture models including the normal, t-, skew normal, and skew t-mixture models. △ Less

Submitted 7 June, 2016; originally announced June 2016.

arXiv:1601.00773 [pdf, other]

Comment on "On Nomenclature, and the Relative Merits of Two Formulations of Skew Distributions" by A. Azzalini, R. Browne, M. Genton, and P. McNicholas

Authors: Geoffrey J. McLachlan, Sharon X. Lee

Abstract: We comment on the recent paper by Azzalini et al. (2015) on two different distributions proposed in the literature for the modelling of data that have asymmetric and possibly long-tailed clusters. They are referred to as the restricted and unrestricted skew normal and skew t-distributions by Lee and McLachlan (2013a). We clarify an apparent misunderstanding in Azzalini et al.(2015) of this nomencl… ▽ More We comment on the recent paper by Azzalini et al. (2015) on two different distributions proposed in the literature for the modelling of data that have asymmetric and possibly long-tailed clusters. They are referred to as the restricted and unrestricted skew normal and skew t-distributions by Lee and McLachlan (2013a). We clarify an apparent misunderstanding in Azzalini et al.(2015) of this nomenclature to distinguish between these two models. Also, we note that McLachlan and Lee (2014) have obtained improved results for the unrestricted model over those reported in Azzalini et al. (2015) for the two datasets that were analysed by them to form the basis of their claimson the relative superiority of the restricted and unrestricted models. On this matter of the relative superiority of these two models, Lee and McLachlan (2014b, 2016) have shown how a distribution belonging to the broader class, the canonical fundamental skew t (CFUST) class, can be fitted with little additional computational effort than for the unrestricted distribution. The CFUST class includes the restricted and unrestricted distributions as special cases. Thus the user now has the option of letting the data decide as to which model is appropriate for their particular dataset. △ Less

Submitted 5 January, 2016; originally announced January 2016.

arXiv:1509.02069 [pdf, other]

EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions

Authors: Sharon X. Lee, Geoffrey J. McLachlan

Abstract: This paper presents an R package EMMIXcskew for the fitting of the canonical fundamental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution provides a flexible family of models to handle non-normal data, with parameters for capturing skewness and heavy-tails in the data. It formally encompasses the normal, t, and skew… ▽ More This paper presents an R package EMMIXcskew for the fitting of the canonical fundamental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution provides a flexible family of models to handle non-normal data, with parameters for capturing skewness and heavy-tails in the data. It formally encompasses the normal, t, and skew-normal distributions as special and/or limiting cases. A few other versions of the skew t-distributions are also nested within the CFUST distribution. In this paper, an Expectation-Maximization (EM) algorithm is described for computing the ML estimates of the parameters of the FM-CFUST model, and different strategies for initializing the algorithm are discussed and illustrated. The methodology is implemented in the EMMIXcskew package, and examples are presented using two real datasets. The EMMIXcskew package contains functions to fit the FM-CFUST model, including procedures for generating different initial values. Additional features include random sample generation and contour visualization in 2D and 3D. △ Less

Submitted 9 February, 2017; v1 submitted 7 September, 2015; originally announced September 2015.

arXiv:1411.2820 [pdf, other]

Supervised Classification of Flow Cytometric Samples via the Joint Clustering and Matching (JCM) Procedure

Authors: Sharon X. Lee, Geoffrey J. McLachlan, Saumyadipta Pyne

Abstract: We consider the use of the Joint Clustering and Matching (JCM) procedure for the supervised classification of a flow cytometric sample with respect to a number of predefined classes of such samples. The JCM procedure has been proposed as a method for the unsupervised classification of cells within a sample into a number of clusters and in the case of multiple samples, the matching of these cluster… ▽ More We consider the use of the Joint Clustering and Matching (JCM) procedure for the supervised classification of a flow cytometric sample with respect to a number of predefined classes of such samples. The JCM procedure has been proposed as a method for the unsupervised classification of cells within a sample into a number of clusters and in the case of multiple samples, the matching of these clusters across the samples. The two tasks of clustering and matching of the clusters are performed simultaneously within the JCM framework. In this paper, we consider the case where there is a number of distinct classes of samples whose class of origin is known, and the problem is to classify a new sample of unknown class of origin to one of these predefined classes. For example, the different classes might correspond to the types of a particular disease or to the various health outcomes of a patient subsequent to a course of treatment. We show and demonstrate on some real datasets how the JCM procedure can be used to carry out this supervised classification task. A mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample with the components in the mixture model corresponding to the various populations of cells in the composition of the sample. For each class of samples, a class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The classification of a new unclassified sample is undertaken by assigning the unclassified sample to the class that minimizes the Kullback-Leibler distance between its fitted mixture density and each class density provided by the class templates. △ Less

Submitted 11 November, 2014; originally announced November 2014.

arXiv:1405.0685 [pdf, other]

Finite Mixtures of Canonical Fundamental Skew t-Distributions

Authors: Sharon X. Lee, Geoffrey J. McLachlan

Abstract: This is an extended version of the paper Lee and McLachlan (2014b) with simulations and applications added. This paper introduces a finite mixture of canonical fundamental skew t (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed (Lee and McLachlan, 2014b). The family of CFUST distributions includes the restricted multivariate… ▽ More This is an extended version of the paper Lee and McLachlan (2014b) with simulations and applications added. This paper introduces a finite mixture of canonical fundamental skew t (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed (Lee and McLachlan, 2014b). The family of CFUST distributions includes the restricted multivariate skew t (rMST) and unrestricted multivariate skew t (uMST) distributions as special cases. In recent years, a few versions of the multivariate skew t (MST) model have been put forward, together with various EM-type algorithms for parameter estimation. These formulations adopted either a restricted or unrestricted characterization for their MST densities. In this paper, we examine a natural generalization of these developments, employing the CFUST distribution as the parametric family for the component distributions, and point out that the restricted and unrestricted characterizations can be unified under this general formulation. We show that an exact implementation of the EM algorithm can be achieved for the CFUST distribution and mixtures of this distribution, and present some new analytical results for a conditional expectation involved in the E-step. △ Less

Submitted 4 May, 2014; originally announced May 2014.

Comments: This is an extended version of the paper Lee and McLachlan (2014b) with simulations and applications added

arXiv:1404.1733 [pdf, other]

Comment on "Comparing two formulations of skew distributions with special reference to model-based clustering" by A. Azzalini, R. Browne, M. Genton, and P. McNicholas

Authors: Geoffrey J. McLachlan, Sharon X. Lee

Abstract: In this paper, we comment on the recent comparison in Azzalini et al. (2014) of two different distributions proposed in the literature for the modelling of data that have asymmetric and possibly long-tailed clusters. They are referred to as the restricted and unrestricted skew t-distributions by Lee and McLachlan (2013a). Firstly, we wish to point out that in Lee and McLachlan (2014b), which prece… ▽ More In this paper, we comment on the recent comparison in Azzalini et al. (2014) of two different distributions proposed in the literature for the modelling of data that have asymmetric and possibly long-tailed clusters. They are referred to as the restricted and unrestricted skew t-distributions by Lee and McLachlan (2013a). Firstly, we wish to point out that in Lee and McLachlan (2014b), which preceded this comparison, it is shown how a distribution belonging to the broader class, the canonical fundamental skew t (CFUST) class, can be fitted with essentially no additional computational effort than for the unrestricted distribution. The CFUST class includes the restricted and unrestricted distributions as special cases. Thus the user now has the option of letting the data decide as to which model is appropriate for their particular dataset. Secondly, we wish to identify several statements in the comparison by Azzalini et al.(2014) that demonstrate a serious misunderstanding of the reporting of results in Lee and McLachlan (2014a) on the relative performance of these two skew t-distributions. In particular, there is an apparent misunderstanding of the nomenclature that has been adopted to distinguish between these two models. Thirdly, we take the opportunity to report here that we have obtained improved fits, in some cases a marked improvement, for the unrestricted model for various cases corresponding to different combinations of the variables in the two real datasets that were used in Azzalini et al. (2014) to mount their claims on the relative superiority of the restricted and unrestricted models. For one case the misclassification rate of our fit under the unrestricted model is less than one third of their reported error rate. Our results thus reverse their claims on the ranking of the restricted and unrestricted models in such cases. △ Less

Submitted 7 April, 2014; originally announced April 2014.

arXiv:1401.8182 [pdf, other]

Maximum Likelihood Estimation for Finite Mixtures of Canonical Fundamental Skew t-Distributions: the Unification of the Unrestricted and Restricted Skew t-Mixture Models

Authors: Sharon X. Lee, Geoffrey J. McLachlan

Abstract: In this paper, we present an algorithm for the fitting of a location-scale variant of the canonical fundamental skew t (CFUST) distribution, a superclass of the restricted and unrestricted skew t-distributions. In recent years, a few versions of the multivariate skew $t$ (MST) model have been put forward, together with various EM-type algorithms for parameter estimation. These formulations adopted… ▽ More In this paper, we present an algorithm for the fitting of a location-scale variant of the canonical fundamental skew t (CFUST) distribution, a superclass of the restricted and unrestricted skew t-distributions. In recent years, a few versions of the multivariate skew $t$ (MST) model have been put forward, together with various EM-type algorithms for parameter estimation. These formulations adopted either a restricted or unrestricted characterization for their MST densities. In this paper, we examine a natural generalization of these developments, employing the CFUST distribution as the parametric family for the component distributions, and point out that the restricted and unrestricted characterizations can be unified under this general formulation. We show that an exact implementation of the EM algorithm can be achieved for the CFUST distribution and mixtures of this distribution, and present some new analytical results for a conditional expectation involved in the E-step. △ Less

Submitted 31 January, 2014; originally announced January 2014.

arXiv:1310.5336 [pdf, other]

The skew-t factor analysis model

Authors: Tsung-I Lin, Pal H. Wu, Geoffrey J. McLachlan, Sharon X. Lee

Abstract: Factor analysis is a classical data reduction technique that seeks a potentially lower number of unobserved variables that can account for the correlations among the observed variables. This paper presents an extension of the factor analysis model by assuming jointly a restricted version of multivariate skew t distribution for the latent factors and unobservable errors, called the skew-t factor an… ▽ More Factor analysis is a classical data reduction technique that seeks a potentially lower number of unobserved variables that can account for the correlations among the observed variables. This paper presents an extension of the factor analysis model by assuming jointly a restricted version of multivariate skew t distribution for the latent factors and unobservable errors, called the skew-t factor analysis model. The proposed model shows robustness to violations of normality assumptions of the underlying latent factors and provides flexibility in capturing extra skewness as well as heavier tails of the observed data. A computationally feasible ECM algorithm is developed for computing maximum likelihood estimates of the parameters. The usefulness of the proposed methodology is illustrated by a real-life example and results also demonstrates its better performance over various existing methods. △ Less

Submitted 3 December, 2013; v1 submitted 20 October, 2013; originally announced October 2013.

arXiv:1307.1748 [pdf, other]

Extending mixtures of factor models using the restricted multivariate skew-normal distribution

Authors: Tsung-I Lin, Geoffrey J. McLachlan, Sharon X. Lee

Abstract: The mixture of factor analyzers (MFA) model provides a powerful tool for analyzing high-dimensional data as it can reduce the number of free parameters through its factor-analytic representation of the component covariance matrices. This paper extends the MFA model to incorporate a restricted version of the multivariate skew-normal distribution to model the distribution of the latent component fac… ▽ More The mixture of factor analyzers (MFA) model provides a powerful tool for analyzing high-dimensional data as it can reduce the number of free parameters through its factor-analytic representation of the component covariance matrices. This paper extends the MFA model to incorporate a restricted version of the multivariate skew-normal distribution to model the distribution of the latent component factors, called mixtures of skew-normal factor analyzers (MSNFA). The proposed MSNFA model allows us to relax the need for the normality assumption for the latent factors in order to accommodate skewness in the observed data. The MSNFA model thus provides an approach to model-based density estimation and clustering of high-dimensional data exhibiting asymmetric characteristics. A computationally feasible ECM algorithm is developed for computing the maximum likelihood estimates of the parameters. Model selection can be made on the basis of three commonly used information-based criteria. The potential of the proposed methodology is exemplified through applications to two real examples, and the results are compared with those obtained from fitting the MFA model. △ Less

Submitted 6 July, 2013; originally announced July 2013.

arXiv:1211.5290 [pdf, ps, other]

EMMIX-uskew: An R Package for Fitting Mixtures of Multivariate Skew t-distributions via the EM Algorithm

Authors: Sharon X. Lee, Geoffrey J. McLachlan

Abstract: This paper describes an algorithm for fitting finite mixtures of unrestricted Multivariate Skew t (FM-uMST) distributions. The package EMMIX-uskew implements a closed-form expectation-maximization (EM) algorithm for computing the maximum likelihood (ML) estimates of the parameters for the (unrestricted) FM-MST model in R. EMMIX-uskew also supports visualization of fitted contours in two and three… ▽ More This paper describes an algorithm for fitting finite mixtures of unrestricted Multivariate Skew t (FM-uMST) distributions. The package EMMIX-uskew implements a closed-form expectation-maximization (EM) algorithm for computing the maximum likelihood (ML) estimates of the parameters for the (unrestricted) FM-MST model in R. EMMIX-uskew also supports visualization of fitted contours in two and three dimensions, and random sample generation from a specified FM-uMST distribution. Finite mixtures of skew t-distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour, for example, datasets from flow cytometry. In recent years, various versions of mixtures with multivariate skew t (MST) distributions have been proposed. However, these models adopted some restricted characterizations of the component MST distributions so that the E-step of the EM algorithm can be evaluated in closed form. This paper focuses on mixtures with unrestricted MST components, and describes an iterative algorithm for the computation of the ML estimates of its model parameters. The usefulness of the proposed algorithm is demonstrated in three applications to real data sets. The first example illustrates the use of the main function fmmst in the package by fitting a MST distribution to a bivariate unimodal flow cytometric sample. The second example fits a mixture of MST distributions to the Australian Institute of Sport (AIS) data, and demonstrate that EMMIX-uskew can provide better clustering results than mixtures with restricted MST components. In the third example, EMMIX-uskew is applied to classify cells in a trivariate flow cytometric dataset. Comparisons with other available methods suggests that the EMMIX-uskew result achieved a lower misclassification rate with respect to the labels given by benchmark gating analysis. △ Less

Submitted 27 March, 2013; v1 submitted 22 November, 2012; originally announced November 2012.

arXiv:1211.3602 [pdf, ps, other]

doi 10.1007/s11634-013-0132-8

On Mixtures of Skew Normal and Skew t-Distributions

Authors: Sharon X. Lee, Geoffrey J. McLachlan

Abstract: Finite mixture of skew distributions have emerged as an effective tool in modelling heterogeneous data with asymmetric features. With various proposals appearing rapidly in the recent years, which are similar but not identical, the connections between them and their relative performance becomes rather unclear. This paper aims to provide a concise overview of these developments by presenting a syst… ▽ More Finite mixture of skew distributions have emerged as an effective tool in modelling heterogeneous data with asymmetric features. With various proposals appearing rapidly in the recent years, which are similar but not identical, the connections between them and their relative performance becomes rather unclear. This paper aims to provide a concise overview of these developments by presenting a systematic classification of the existing skew distributions into four types, thereby clarifying their close relationships. This also aids in understanding the link between some of the proposed expectation-maximization (EM) based algorithms for the computation of the maximum likelihood estimates of the parameters of the models. The final part of this paper presents an illustration of the performance of these mixture models in clustering a real dataset, relative to other non-elliptically contoured clustering methods and associated algorithms for their implementation. △ Less

Submitted 28 May, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

Journal ref: Advances in Data Analysis and Classification 2013

arXiv:1109.4706 [pdf, ps, other]

On the fitting of mixtures of multivariate skew t-distributions via the EM algorithm

Authors: S. X. Lee, G. J. McLachlan

Abstract: We show how the expectation-maximization (EM) algorithm can be applied exactly for the fitting of mixtures of general multivariate skew t (MST) distributions, eliminating the need for computationally expensive Monte Carlo estimation. Finite mixtures of MST distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been explo… ▽ More We show how the expectation-maximization (EM) algorithm can be applied exactly for the fitting of mixtures of general multivariate skew t (MST) distributions, eliminating the need for computationally expensive Monte Carlo estimation. Finite mixtures of MST distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. However, without restrictions on the the characterizations of the component skew t-distributions, Monte Carlo methods have been used to fit these models. In this paper, we show how the EM algorithm can be implemented for the iterative computation of the maximum likelihood estimates of the model parameters without resorting to Monte Carlo methods for mixtures with unrestricted MST components. The fast calculation of semi-infinite integrals on the E-step of the EM algorithm is effected by noting that they can be put in the form of moments of the truncated multivariate t-distribution, which subsequently can be expressed in terms of the non-truncated form of the t-distribution function for which fast algorithms are available. We demonstrate the usefulness of the proposed methodology by some applications to three real data sets. △ Less

Submitted 5 September, 2012; v1 submitted 22 September, 2011; originally announced September 2011.

Showing 1–17 of 17 results for author: Lee, S X