Skip to main content

Showing 1–9 of 9 results for author: Wang, K A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2501.12352  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Test-time regression: a unifying framework for designing sequence models with associative memory

    Authors: Ke Alexander Wang, Jiaxin Shi, Emily B. Fox

    Abstract: Sequence models lie at the heart of modern deep learning. However, rapid advancements have produced a diversity of seemingly unrelated architectures, such as Transformers and recurrent alternatives. In this paper, we introduce a unifying framework to understand and derive these sequence models, inspired by the empirical importance of associative recall, the capability to retrieve contextually rele… ▽ More

    Submitted 1 May, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  2. arXiv:2312.03344  [pdf, other

    cs.LG math.DS stat.AP stat.ML

    Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild

    Authors: Ke Alexander Wang, Emily B. Fox

    Abstract: Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Proceedings of Machine Learning for Health (ML4H) 2023. Code available at: https://github.com/KeAWang/interpretable-cgm-representations

  3. arXiv:2305.01638  [pdf, other

    cs.LG cs.CV stat.ML

    Sequence Modeling with Multiresolution Convolutional Memory

    Authors: Jiaxin Shi, Ke Alexander Wang, Emily B. Fox

    Abstract: Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural n… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: ICML 2023, Source code: https://github.com/thjashin/multires-conv

  4. arXiv:2112.12986  [pdf, other

    cs.LG stat.ML

    Is Importance Weighting Incompatible with Interpolating Classifiers?

    Authors: Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

    Abstract: Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We sh… ▽ More

    Submitted 4 March, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: International Conference on Learning Representations (ICLR), 2022

  5. arXiv:2106.06695  [pdf, other

    cs.LG stat.ML

    SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes

    Authors: Sanyam Kapoor, Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice us… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: International Conference on Machine Learning (ICML), 2021

  6. arXiv:2104.09460  [pdf, other

    stat.ML cs.AI cs.IT cs.LG cs.NE

    Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information

    Authors: Willie Neiswanger, Ke Alexander Wang, Stefano Ermon

    Abstract: In many real-world problems, we want to infer some property of an expensive black-box function $f$, given a budget of $T$ function evaluations. One example is budget constrained global optimization of $f$, for which Bayesian optimization is a popular method. Other properties of interest include local optima, level sets, integrals, or graph-structured information induced by $f$. Often, we can find… ▽ More

    Submitted 6 July, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Appears in Proceedings of the 38th International Conference on Machine Learning (ICML), 2021

  7. arXiv:2010.13581  [pdf, other

    cs.LG math.DS physics.comp-ph physics.data-an stat.ML

    Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints

    Authors: Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show th… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/mfinzi/constrained-hamiltonian-neural-networks

  8. arXiv:1911.06944  [pdf, other

    cs.LG cs.DC stat.CO stat.ME stat.ML

    $DC^2$: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

    Authors: Ke Alexander Wang, Xinran Bian, Pan Liu, Donghui Yan

    Abstract: Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random sampling which may be suboptimal. To address these concerns, we propose the $DC^2$ algorithm. Instead of ensemble instances, we produce structure-preserving sig… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

  9. arXiv:1903.08114  [pdf, other

    cs.LG cs.DC stat.ML

    Exact Gaussian Processes on a Million Data Points

    Authors: Ke Alexander Wang, Geoff Pleiss, Jacob R. Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon Wilson

    Abstract: Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data. However, computational constraints with standard inference procedures have limited exact GPs to problems with fewer than about ten thousand training points, necessitating approximations for larger datasets. In this paper, we develop a scalable approach for exact GPs that leverages multi… ▽ More

    Submitted 10 December, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

    Comments: Published at NeurIPS 2019