Skip to main content

Showing 1–10 of 10 results for author: Bowman, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12149  [pdf, ps, other

    cs.CL

    Maximally-Informative Retrieval for State Space Model Generation

    Authors: Evan Becker, Benjamin Bowman, Matthew Trager, Tian Yu Liu, Luca Zancato, Wei Xia, Stefano Soatto

    Abstract: Given a query and dataset, the optimal way of answering the query is to make use all the information available. Modern LLMs exhibit impressive ability to memorize training data, but data not deemed important during training is forgotten, and information outside that training set cannot be made use of. Processing an entire dataset at inference time is infeasible due to the bounded nature of model r… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  2. arXiv:2412.13328  [pdf, other

    cs.CL cs.LG

    Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models

    Authors: Elvis Nunez, Luca Zancato, Benjamin Bowman, Aditya Golatkar, Wei Xia, Stefano Soatto

    Abstract: The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetical… ▽ More

    Submitted 24 May, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

  3. arXiv:2407.06324  [pdf, other

    cs.LG cs.CL cs.NE

    B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory

    Authors: Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao Shen, Benjamin Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto

    Abstract: We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span ("context" in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent h… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  4. arXiv:2304.13169  [pdf, other

    cs.LG

    SAFE: Machine Unlearning With Shard Graphs

    Authors: Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto

    Abstract: We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also known as selective forgetting or unlearning, is often conducted by partitioning a dataset into shards, training fully independent models on each, then ensembling… ▽ More

    Submitted 22 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted at ICCV 2023

  5. arXiv:2303.04105  [pdf, other

    cs.LG cs.CV

    Your representations are in the network: composable and parallel adaptation for large scale models

    Authors: Yonatan Dukler, Alessandro Achille, Hao Yang, Varsha Vivek, Luca Zancato, Benjamin Bowman, Avinash Ravichandran, Charless Fowlkes, Ashwin Swaminathan, Stefano Soatto

    Abstract: We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieve… ▽ More

    Submitted 31 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023

  6. arXiv:2302.07994  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting

    Authors: Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, Giovanni Paolini, Stefano Soatto

    Abstract: We introduce À-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. The individual prompts can be trained in isolation, possibly on different devices, at different times, and on different distributions or domains. Furthermore each prompt only contains information about the subset of data it was exposed… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: 13 pages, 4 figures, 8 tables

  7. arXiv:2211.07844  [pdf, other

    cs.LG

    Characterizing the Spectrum of the NTK via a Power Series Expansion

    Authors: Michael Murray, Hui Jin, Benjamin Bowman, Guido Montufar

    Abstract: Under mild conditions on the network initialization we derive a power series expansion for the Neural Tangent Kernel (NTK) of arbitrarily deep feedforward networks in the infinite width limit. We provide expressions for the coefficients of this power series which depend on both the Hermite coefficients of the activation function as well as the depth of the network. We observe faster decay of the H… ▽ More

    Submitted 28 February, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: 55 pages, 3 Figures, 1 Table

  8. arXiv:2206.02927  [pdf, other

    stat.ML cs.LG

    Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

    Authors: Benjamin Bowman, Guido Montufar

    Abstract: We provide quantitative bounds measuring the $L^2$ difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the enti… ▽ More

    Submitted 14 October, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: 38 pages, 1 figure, to be published in NeurIPS 2022

  9. arXiv:2201.04738  [pdf, other

    stat.ML cs.LG

    Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks

    Authors: Benjamin Bowman, Guido Montufar

    Abstract: We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that in the underparameterized regime the network learns eigenfunctions of an integral operator $T_{K^\infty}$ determined by the Neural Tangent Kernel (NTK) at rates corresponding to their eigenvalues. For example, for uniformly distributed data on the sphere $S^{d - 1}$ an… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 61 pages, submitted to ICLR 2022

  10. arXiv:2008.09192  [pdf

    cs.CR cs.LG

    PicoDomain: A Compact High-Fidelity Cybersecurity Dataset

    Authors: Craig Laprade, Benjamin Bowman, H. Howie Huang

    Abstract: Analysis of cyber relevant data has become an area of increasing focus. As larger percentages of businesses and governments begin to understand the implications of cyberattacks, the impetus for better cybersecurity solutions has increased. Unfortunately, current cybersecurity datasets either offer no ground truth or do so with anonymized data. The former leads to a quandary when verifying results… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.