Skip to main content

Showing 1–7 of 7 results for author: Bair, E

Searching in archive stat. Search in all archives.
.
  1. arXiv:1610.01424  [pdf, other

    stat.ME stat.ML

    Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

    Authors: Erika S. Helgeson, Eric Bair

    Abstract: Cluster analysis is an unsupervised learning strategy that can be employed to identify subgroups of observations in data sets of unknown structure. This strategy is particularly useful for analyzing high-dimensional data such as microarray gene expression data. Many clustering methods are available, but it is challenging to determine if the identified clusters represent distinct subgroups. We prop… ▽ More

    Submitted 5 October, 2016; v1 submitted 5 October, 2016; originally announced October 2016.

  2. arXiv:1407.3010  [pdf, other

    stat.ME stat.ML

    Biclustering Via Sparse Clustering

    Authors: Qian Liu, Guanhua Chen, Michael R. Kosorok, Eric Bair

    Abstract: In many situations it is desirable to identify clusters that differ with respect to only a subset of features. Such clusters may represent homogeneous subgroups of patients with a disease, such as cancer or chronic pain. We define a bicluster to be a submatrix U of a larger data matrix X such that the features and observations in U differ from those not contained in U. For example, the observation… ▽ More

    Submitted 10 July, 2014; originally announced July 2014.

    Comments: 40 pages, 8 figures, 10 tables

  3. arXiv:1307.0252  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Semi-supervised clustering methods

    Authors: Eric Bair

    Abstract: Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information… ▽ More

    Submitted 30 June, 2013; originally announced July 2013.

    Comments: 28 pages, 5 figures

    Journal ref: WIREs Comp Stat, 2013, 5(5): 349-361

  4. arXiv:1304.3839  [pdf, ps, other

    stat.ME stat.AP

    Parameter estimation in Cox models with missing failure indicators and the OPPERA study

    Authors: Naomi Brownstein, Jianwen Cai, Gary Slade, Eric Bair

    Abstract: In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the "gold standard" for diagnosing temporomandibular disorder (TMD) is a physical examination by a trained clinician. In large studies, examining all participants in this manner is infeasible. Instead, it is common to use questionnaires to screen for inc… ▽ More

    Submitted 7 July, 2015; v1 submitted 13 April, 2013; originally announced April 2013.

    Comments: Version 4: 23 pages, 0 figures

  5. arXiv:1304.3838  [pdf, ps, other

    stat.ME q-bio.GN q-bio.QM stat.AP

    Identification of significant features in DNA microarray data

    Authors: Eric Bair

    Abstract: DNA microarrays are a relatively new technology that can simultaneously measure the expression level of thousands of genes. They have become an important tool for a wide variety of biological experiments. One of the most common goals of DNA microarray experiments is to identify genes associated with biological processes of interest. Conventional statistical tests often produce poor results when ap… ▽ More

    Submitted 13 April, 2013; originally announced April 2013.

    Comments: 35 pages, 6 figures. To be published in WIREs Computational Statistics

    Journal ref: WIREs Comp Stat, 2013, 5(4): 309-325

  6. arXiv:1304.3760  [pdf, ps, other

    stat.ME cs.LG q-bio.QM stat.AP stat.ML

    Identification of relevant subtypes via preweighted sparse clustering

    Authors: Sheila Gaynor, Eric Bair

    Abstract: Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest. Conventional clustering methods generally do not identify such subgroups, particula… ▽ More

    Submitted 21 September, 2016; v1 submitted 12 April, 2013; originally announced April 2013.

    Comments: Version 4: 49 pages, 5 figures

  7. Cross-Validation for Nonlinear Mixed Effects Models

    Authors: Emily Colby, Eric Bair

    Abstract: Cross-validation is frequently used for model selection in a variety of applications. However, it is difficult to apply cross-validation to mixed effects models (including nonlinear mixed effects models or NLME models) due to the fact that cross-validation requires "out-of-sample" predictions of the outcome variable, which cannot be easily calculated when random effects are present. We describe tw… ▽ More

    Submitted 9 April, 2013; originally announced April 2013.

    Comments: 38 pages, 15 figures To be published in the Journal of Pharmacokinetics and Pharmacodynamics

    Journal ref: Journal of Pharmacokinetics and Pharmacodynamics, April 2013, 40(2): 243-252