Skip to main content

Showing 1–6 of 6 results for author: Fukuyama, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.04528  [pdf, ps, other

    cs.LG cs.AI stat.ME stat.ML

    A Consequentialist Critique of Binary Classification Evaluation Practices

    Authors: Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. Wilson

    Abstract: ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-R… ▽ More

    Submitted 30 June, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  2. Calibrating dimension reduction hyperparameters in the presence of noise

    Authors: Justin Lin, Julia Fukuyama

    Abstract: The goal of dimension reduction tools is to construct a low-dimensional representation of high-dimensional data. These tools are employed for a variety of reasons such as noise reduction, visualization, and to lower computational costs. However, there is a fundamental issue that is discussed in other modeling problems that is often overlooked in dimension reduction -- overfitting. In the context o… ▽ More

    Submitted 11 July, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 22 pages, 35 figures

    Journal ref: PLOS Computational Biology (2024)

  3. arXiv:2109.05541  [pdf, other

    stat.AP stat.CO

    Multiscale Analysis of Count Data through Topic Alignment

    Authors: Julia Fukuyama, Kris Sankaran, Laura Symul

    Abstract: Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics $K$. Since there is no definitive way to choose $K$ and since a true value might not exist, we develop techniques to study the relationships across models with different $K$. This can show how many topics are consistently present across different models, if a top… ▽ More

    Submitted 8 March, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

  4. arXiv:2008.02662  [pdf, other

    stat.ME

    Local biplots for multi-dimensional scaling, with application to the microbiome

    Authors: Julia Fukuyama

    Abstract: We present local biplots, a an extension of the classic principal components biplot to multi-dimensional scaling. Noticing that principal components biplots have an interpretation as the Jacobian of a map from data space to the principal subspace, we define local biplots as the Jacobian of the analogous map for multi-dimensional scaling. In the process, we show a close relationship between our loc… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  5. arXiv:2007.01340  [pdf, other

    q-bio.PE stat.AP

    Lack of evidence for a substantial rate of templated mutagenesis in B cell diversification

    Authors: Julia Fukuyama, Branden J Olson, Frederick A Matsen IV

    Abstract: B cell receptor sequences diversify through mutations introduced by purpose-built cellular machinery. A recent paper has concluded that a "templated mutagenesis" process is a major contributor to somatic hypermutation, and therefore immunoglobulin diversification, in mice and humans. In this proposed process, mutations in the immunoglobulin locus are introduced by copying short segments from other… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

  6. arXiv:1702.00501  [pdf, other

    stat.ME stat.AP

    Adaptive gPCA: A method for structured dimensionality reduction

    Authors: Julia Fukuyama

    Abstract: When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. T… ▽ More

    Submitted 1 February, 2017; originally announced February 2017.

    Comments: 26 pages, 5 figures