Search | arXiv e-print repository

Dendrogram of mixing measures: Hierarchical clustering and model selection for finite mixture models

Authors: Dat Do, Linh Do, Scott A. McKinley, Jonathan Terhorst, XuanLong Nguyen

Abstract: We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and mixture modeling. The dendrogram's construction is derived from the theory of convergence of the mixing measures, and as a result, we can both consistently select the… ▽ More We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and mixture modeling. The dendrogram's construction is derived from the theory of convergence of the mixing measures, and as a result, we can both consistently select the true number of mixing components and obtain the pointwise optimal convergence rate for parameter estimation from the tree, even when the model parameters are only weakly identifiable. In theory, it explicates the choice of the optimal number of clusters in hierarchical clustering. In practice, the dendrogram reveals more information on the hierarchy of subpopulations compared to traditional ways of summarizing mixture models. Several simulation studies are carried out to support our theory. We also illustrate the methodology with an application to single-cell RNA sequence analysis. △ Less

Submitted 8 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 53 pages, 11 figures

arXiv:2111.10841 [pdf, other]

A linear adjustment based approach to posterior drift in transfer learning

Authors: Subha Maity, Diptavo Dutta, Jonathan Terhorst, Yuekai Sun, Moulinath Banerjee

Abstract: We present a new model and methods for the posterior drift problem where the regression function in the target domain is modeled as a linear adjustment (on an appropriate scale) of that in the source domain, an idea that inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature, and study the theoretical pro… ▽ More We present a new model and methods for the posterior drift problem where the regression function in the target domain is modeled as a linear adjustment (on an appropriate scale) of that in the source domain, an idea that inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature, and study the theoretical properties of our proposed estimator in the binary classification problem. Our approach is shown to be flexible and applicable in a variety of statistical settings, and can be adopted to transfer learning problems in various domains including epidemiology, genetics and biomedicine. As a concrete application, we illustrate the power of our approach through mortality prediction for British Asians by borrowing strength from similar data from the larger pool of British Caucasians, using the UK Biobank data. △ Less

Submitted 12 December, 2021; v1 submitted 21 November, 2021; originally announced November 2021.

arXiv:2003.01640 [pdf, other]

Explaining Groups of Points in Low-Dimensional Representations

Authors: Gregory Plumb, Jonathan Terhorst, Sriram Sankararaman, Ameet Talwalkar

Abstract: A common workflow in data exploration is to learn a low-dimensional representation of the data, identify groups of points in that representation, and examine the differences between the groups to determine what they represent. We treat this workflow as an interpretable machine learning problem by leveraging the model that learned the low-dimensional representation to help identify the key differen… ▽ More A common workflow in data exploration is to learn a low-dimensional representation of the data, identify groups of points in that representation, and examine the differences between the groups to determine what they represent. We treat this workflow as an interpretable machine learning problem by leveraging the model that learned the low-dimensional representation to help identify the key differences between the groups. To solve this problem, we introduce a new type of explanation, a Global Counterfactual Explanation (GCE), and our algorithm, Transitive Global Translations (TGT), for computing GCEs. TGT identifies the differences between each pair of groups using compressed sensing but constrains those pairwise differences to be consistent among all of the groups. Empirically, we demonstrate that TGT is able to identify explanations that accurately explain the model while being relatively sparse, and that these explanations match real patterns in the data. △ Less

Submitted 14 August, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

arXiv:1409.1458 [pdf, ps, other]

Communication-Efficient Distributed Dual Coordinate Ascent

Authors: Martin Jaggi, Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I. Jordan

Abstract: Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In this paper, we propose a communication-efficient framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algor… ▽ More Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In this paper, we propose a communication-efficient framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to state-of-the-art mini-batch versions of SGD and SDCA algorithms, CoCoA converges to the same .001-accurate solution quality on average 25x as quickly. △ Less

Submitted 29 September, 2014; v1 submitted 4 September, 2014; originally announced September 2014.

Comments: NIPS 2014 version, including proofs. Published in Advances in Neural Information Processing Systems 27 (NIPS 2014)

MSC Class: 90C25; 68W15 ACM Class: G.1.6; C.1.4

Showing 1–4 of 4 results for author: Terhorst, J