-
Selective Clustering Annotated using Modes of Projections
Authors:
Evan Greene,
Greg Finak,
Raphael Gottardo
Abstract:
Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in $\mathbb{R}^p$. SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the…
▽ More
Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in $\mathbb{R}^p$. SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the number of clusters is $\textbf{not}$ a SCAMP tuning parameter. The search phase of SCAMP consists of finding sub-collections of the data matrix, called candidate clusters, that obey shape constraints along each coordinate projection. An extension of the dip test of Hartigan and Hartigan (1985) is developed to assist the search. Selection occurs by scoring each candidate cluster with a preference function that quantifies prior belief about the mixture composition. Clustering proceeds by selecting candidates to maximize their total preference score. SCAMP concludes by annotating each selected cluster with labels that describe how cluster-level statistics compare to certain dataset-level quantities. SCAMP can be run multiple times on a single data matrix. Comparison of annotations obtained across iterations provides a measure of clustering uncertainty. Simulation studies and applications to real data are considered. A C++ implementation with R interface is $\href{https://github.com/RGLab/scamp}{available\ online}$.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
Data Exploration, Quality Control and Testing in Single-Cell qPCR-Based Gene Expression Experiments
Authors:
Andrew McDavid,
Greg Finak,
Pratip K. Chattopadyay,
Maria Dominguez,
Laurie Lamoreaux,
Steven S. Ma,
Mario Roederer,
Raphael Gottardo
Abstract:
Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions (qPCR) now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However very little analyti…
▽ More
Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions (qPCR) now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However very little analytic tools have been developed specifically for the statistical and analytical challenges of single-cell qPCR data. We present a statistical framework for the exploration, quality control, and analysis of single-cell gene expression data from microfluidic arrays. We assess accuracy and within-sample heterogeneity of single-cell expression and develop quality control criteria to filter unreliable cell measurements. We propose a statistical model accounting for the fact that genes at the single-cell level can be on (and for which a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero). Based on this model, we derive a combined likelihood-ratio test for differential expression that incorporates both the discrete and continuous components. Using an experiment that examines treatment-specific changes in expression, we show that this combined test is more powerful than either the continuous or dichotomous component in isolation, or a t-test on the zero-inflated data. While developed for measurements from a specific platform (Fluidigm), these tools are generalizable to other multi-parametric measures over large numbers of events.
△ Less
Submitted 3 October, 2012;
originally announced October 2012.
-
Mixture Models for Single Cell Assays with Applications to Vaccine Studies
Authors:
Greg Finak,
Andrew McDavid,
Pratip Chattopadhyay,
Maria Dominguez,
Steve De Rosa,
Mario Roederer,
Raphael Gottardo
Abstract:
In immunological studies, the characterization of small, functionally distinct cell subsets from blood and tissue is crucial to decipher system level biological changes. An increasing number of studies rely on assays that provide single-cell measurements of multiple genes and proteins from bulk cell samples. A common problem in the analysis of such data is to identify biomarkers (or combinations o…
▽ More
In immunological studies, the characterization of small, functionally distinct cell subsets from blood and tissue is crucial to decipher system level biological changes. An increasing number of studies rely on assays that provide single-cell measurements of multiple genes and proteins from bulk cell samples. A common problem in the analysis of such data is to identify biomarkers (or combinations of thereof) that are differentially expressed between two biological conditions (e.g., before/after vaccination), where expression is defined as the proportion of cells expressing the biomarker or combination in the cell subset of interest.
Here, we present a Bayesian hierarchical framework based on a beta-binomial mixture model for testing for differential biomarker expression using single-cell assays. Our model allows inference to be subject specific, as is typically required when accessing vaccine responses, while borrowing strength across subjects through common prior distributions. We propose two approaches for parameter estimation: an empirical-Bayes approach using an Expectation-Maximization algorithm and a fully Bayesian one based on a Markov chain Monte Carlo algorithm. We compare our method against frequentist approaches for single-cell assays including Fisher's exact test, a likelihood ratio test, and basic log-fold changes. Using several experimental assays measuring proteins or genes at the single-cell level and simulated data, we show that our method has higher sensitivity and specificity than alternative methods. Additional simulations show that our framework is also robust to model misspecification. Finally, we also demonstrate how our approach can be extended to testing multivariate differential expression across multiple biomarker combinations using a Dirichlet-multinomial model and illustrate this multivariate approach using single-cell gene expression data and simulations.
△ Less
Submitted 30 August, 2012; v1 submitted 28 August, 2012;
originally announced August 2012.