Skip to main content

Showing 1–8 of 8 results for author: Bryan, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2509.20702  [pdf, ps, other

    stat.AP cs.AI q-bio.GN

    Incorporating LLM Embeddings for Variation Across the Human Genome

    Authors: Hongqian Niu, Jordan Bryan, Xihao Li, Didong Li

    Abstract: Recent advances in large language model (LLM) embeddings have enabled powerful representations for biological data, but most applications to date focus only on gene-level information. We present one of the first systematic frameworks to generate variant-level embeddings across the entire human genome. Using curated annotations from FAVOR, ClinVar, and the GWAS Catalog, we constructed semantic text… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

  2. arXiv:2404.00753  [pdf, other

    math.ST stat.ME

    Subscedastic weighted least squares estimates

    Authors: Jordan Bryan, Haibo Zhou, Didong Li

    Abstract: In the heteroscedastic linear model, the weighted least squares (WLS) estimate of the model coefficients is more efficient than the ordinary least squares (OLS) esti- mate. However, the practical application of WLS is challenging because it requires knowledge of the error variances. Feasible weighted least squares (FLS) estimates, which use approximations of the variances when they are unknown, ma… ▽ More

    Submitted 27 May, 2025; v1 submitted 31 March, 2024; originally announced April 2024.

  3. arXiv:2310.12460  [pdf, other

    stat.AP stat.ME

    Linear Source Apportionment using Generalized Least Squares

    Authors: Jordan Bryan, Peter Hoff

    Abstract: Motivated by applications to water quality monitoring using fluorescence spectroscopy, we develop the source apportionment model for high dimensional profiles of dissolved organic matter (DOM). We describe simple methods to estimate the parameters of a linear source apportionment model, and show how the estimates are related to those of ordinary and generalized least squares. Using this least squa… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 31 pages, 5 figures

  4. arXiv:2112.07465  [pdf, other

    stat.ME stat.CO

    The multirank likelihood for semiparametric canonical correlation analysis

    Authors: Jordan G. Bryan, Jonathan Niles-Weed, Peter D. Hoff

    Abstract: Many analyses of multivariate data focus on evaluating the dependence between two sets of variables, rather than the dependence among individual variables within each set. Canonical correlation analysis (CCA) is a classical data analysis technique that estimates parameters describing the dependence between such sets. However, inference procedures based on traditional CCA rely on the assumption tha… ▽ More

    Submitted 22 April, 2024; v1 submitted 14 December, 2021; originally announced December 2021.

  5. arXiv:2110.05371  [pdf, other

    cs.SE cs.LG cs.SI stat.ML

    Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

    Authors: Jonathan Bryan, Pablo Moriano

    Abstract: The increasing complexity of today's software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems… ▽ More

    Submitted 14 April, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: 22 pages, 2 figures, 4 tables; references added; expanded results to match baseline conditions

    Journal ref: PLoS ONE 18(4): e0284077, 2023

  6. arXiv:2004.07887  [pdf, other

    stat.AP stat.ME

    Smaller $p$-values in genomics studies using distilled historical information

    Authors: Jordan G. Bryan, Peter D. Hoff

    Abstract: Medical research institutions have generated massive amounts of biological data by genetically profiling hundreds of cancer cell lines. In parallel, academic biology labs have conducted genetic screens on small numbers of cancer cell lines under custom experimental conditions. In order to share information between these two approaches to scientific discovery, this article proposes a "frequentist a… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

  7. arXiv:1712.07349  [pdf

    stat.OT

    Data Science: A Three Ring Circus or a Big Tent?

    Authors: Jennifer Bryan, Hadley Wickham

    Abstract: This is part of a collection of discussion pieces on David Donoho's paper 50 Years of Data Science, appearing in Volume 26, Issue 4 of the Journal of Computational and Graphical Statistics (2017).

    Submitted 20 December, 2017; originally announced December 2017.

  8. arXiv:1409.2655  [pdf, other

    stat.ML cs.LG

    Weighted Classification Cascades for Optimizing Discovery Significance in the HiggsML Challenge

    Authors: Lester Mackey, Jordan Bryan, Man Yue Mo

    Abstract: We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics. The approach alternates between solving a weighted binary classification problem and updating class weights in a simple, closed-form manner. Moreover, an argument based on convex duality shows that an improvement in weighted classification error on any round yields a co… ▽ More

    Submitted 10 September, 2015; v1 submitted 9 September, 2014; originally announced September 2014.