Skip to main content

Showing 1–14 of 14 results for author: Gu, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.02621  [pdf, other

    cs.LG math.OC stat.ML

    Mirror Mean-Field Langevin Dynamics

    Authors: Anming Gu, Juno Kim

    Abstract: The mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional on the Wasserstein space over $\mathbb{R}^d$, and has gained attention recently as a model for the gradient descent dynamics of interacting particle systems such as infinite-width two-layer neural networks. However, many problems of interest have constrained domains, which are not solved by existin… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  2. arXiv:2406.07475  [pdf, other

    cs.LG stat.ML

    Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior

    Authors: Anming Gu, Edward Chien, Kristjan Greenewald

    Abstract: Trajectory inference seeks to recover the temporal dynamics of a population from snapshots of its (uncoupled) temporal marginals, i.e. where observed particles are not tracked over time. Prior works addressed this challenging problem under a stochastic differential equation (SDE) model with a gradient-driven drift in the observed space, introducing a minimum entropy estimator relative to the Wiene… ▽ More

    Submitted 26 February, 2025; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: ICLR 2025

  3. arXiv:2306.15626  [pdf, other

    cs.LG cs.AI cs.LO stat.ML

    LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

    Authors: Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar

    Abstract: Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an op… ▽ More

    Submitted 27 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023 (Datasets and Benchmarks Track) as an oral presentation. Data, code, and models available at https://leandojo.org/

  4. arXiv:2202.07663  [pdf, other

    astro-ph.IM astro-ph.CO stat.AP

    GIGA-Lens: Fast Bayesian Inference for Strong Gravitational Lens Modeling

    Authors: A. Gu, X. Huang, W. Sheu, G. Aldering, A. S. Bolton, K. Boone, A. Dey, A. Filipp, E. Jullo, S. Perlmutter, D. Rubin, E. F. Schlafly, D. J. Schlegel, Y. Shu, S. H. Suyu

    Abstract: We present GIGA-Lens: a gradient-informed, GPU-accelerated Bayesian framework for modeling strong gravitational lensing systems, implemented in TensorFlow and JAX. The three components, optimization using multi-start gradient descent, posterior covariance estimation with variational inference, and sampling via Hamiltonian Monte Carlo, all take advantage of gradient information through automatic di… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: 23 pages, 13 figures, 2 tables. Submitted to ApJ

  5. arXiv:2012.14966  [pdf, other

    cs.LG stat.ML

    Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

    Authors: Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré

    Abstract: Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off… ▽ More

    Submitted 5 January, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: International Conference on Learning Representations (ICLR) 2020 spotlight

  6. arXiv:2010.00402  [pdf, other

    cs.DS cs.LG stat.ML

    From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

    Authors: Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher Ré

    Abstract: Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's di… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

  7. arXiv:2009.11242  [pdf, other

    stat.AP stat.ME

    Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

    Authors: Shi Dong, Zlatan Feric, Guangyu Li, Chieh Wu, April Z. Gu, Jennifer Dy, John Meeker, Ingrid Y. Padilla, Jose Cordero, Carmen Velez Vega, Zaira Rosario, Akram Alshawabkeh, David Kaeli

    Abstract: In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant a… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Journal ref: ICMLA 2020

  8. arXiv:2008.07669  [pdf, other

    cs.LG stat.ML

    HiPPO: Recurrent Memory with Optimal Polynomial Projections

    Authors: Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re

    Abstract: A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal soluti… ▽ More

    Submitted 22 October, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

  9. arXiv:2008.06775  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Model Patching: Closing the Subgroup Performance Gap with Data Augmentation

    Authors: Karan Goel, Albert Gu, Yixuan Li, Christopher Ré

    Abstract: Classifiers in machine learning are often brittle when deployed. Particularly concerning are models with inconsistent performance on specific subgroups of a class, e.g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage. To mitigate these performance differences, we introduce model patching, a two-stage framework for improving robustness that enc… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

  10. arXiv:1903.05895  [pdf, other

    cs.LG stat.ML

    Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

    Authors: Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré

    Abstract: Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and… ▽ More

    Submitted 28 December, 2020; v1 submitted 14 March, 2019; originally announced March 2019.

    Comments: International Conference on Machine Learning (ICML) 2019

  11. arXiv:1810.02309  [pdf, other

    cs.LG stat.ML

    Learning Compressed Transforms with Low Displacement Rank

    Authors: Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré

    Abstract: The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both… ▽ More

    Submitted 1 January, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

    Comments: NeurIPS 2018. Code available at https://github.com/HazyResearch/structured-nets

  12. arXiv:1804.03329  [pdf, other

    cs.LG stat.ML

    Representation Tradeoffs for Hyperbolic Embeddings

    Authors: Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

    Abstract: Hyperbolic embeddings offer excellent quality with few dimensions when embedding hierarchical data structures like synonym or type hierarchies. Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization. On WordNet, our combinatorial embedding obtains a mean-average-precision of 0.989 with only two dimensio… ▽ More

    Submitted 24 April, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

  13. arXiv:1803.06084  [pdf, other

    cs.LG stat.ML

    A Kernel Theory of Modern Data Augmentation

    Authors: Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

    Abstract: Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear natural… ▽ More

    Submitted 20 March, 2019; v1 submitted 16 March, 2018; originally announced March 2018.

  14. arXiv:1408.2794  [pdf, other

    q-fin.ST stat.AP

    Sector-Based Factor Models for Asset Returns

    Authors: Angela Gu, Patrick Zeng

    Abstract: Factor analysis is a statistical technique employed to evaluate how observed variables correlate through common factors and unique variables. While it is often used to analyze price movement in the unstable stock market, it does not always yield easily interpretable results. In this study, we develop improved factor models by explicitly incorporating sector information on our studied stocks. We ad… ▽ More

    Submitted 11 August, 2014; originally announced August 2014.

    Comments: 10 pages, 6 figures