Skip to main content

Showing 1–11 of 11 results for author: Ankner, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.10139  [pdf, ps, other

    cs.CL cs.AI

    Unsupervised Elicitation of Language Models

    Authors: Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, Jacob Goldman-Wetzler, Linda Petrini, Henry Sleight, Collin Burns, He He, Shi Feng, Ethan Perez, Jan Leike

    Abstract: To steer pretrained language models for downstream tasks, today's post-training paradigm relies on humans to specify desired behaviors. However, for models with superhuman capabilities, it is difficult or impossible to get high-quality human supervision. To address this challenge, we introduce a new unsupervised algorithm, Internal Coherence Maximization (ICM), to fine-tune pretrained language mod… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  2. arXiv:2502.11517  [pdf, other

    cs.CL cs.DC cs.LG

    Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

    Authors: Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin

    Abstract: Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these techniques rely on hand-crafted heuristics tied to syntactic structures like lists and paragraphs, making the… ▽ More

    Submitted 21 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 15 pages

  3. arXiv:2411.04330  [pdf, other

    cs.LG cs.CL

    Scaling Laws for Precision

    Authors: Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher RĂ©, Aditi Raghunathan

    Abstract: Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precis… ▽ More

    Submitted 29 November, 2024; v1 submitted 6 November, 2024; originally announced November 2024.

  4. arXiv:2408.11791  [pdf, other

    cs.LG

    Critique-out-Loud Reward Models

    Authors: Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan D. Chang, Prithviraj Ammanabrolu

    Abstract: Traditionally, reward models used for reinforcement learning from human feedback (RLHF) are trained to directly predict preference scores without leveraging the generation capabilities of the underlying large language model (LLM). This limits the capabilities of reward models as they must reason implicitly about the quality of a response, i.e., preference modeling must be performed in a single for… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  5. arXiv:2406.11196  [pdf, other

    cs.CV

    Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

    Authors: Rishab Parthasarathy, Zachary Ankner, Aaron Gokaslan

    Abstract: A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by jointly optimizing for consistency across both time and views of the scene. In this paper, we instead investigate whether it is necessary to explicitly enforce… ▽ More

    Submitted 30 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 14 pages, 10 figures, 3 tables

  6. arXiv:2405.20541  [pdf, other

    cs.LG cs.CL

    Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

    Authors: Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

    Abstract: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  7. arXiv:2402.05109  [pdf, other

    cs.LG

    Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

    Authors: Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

    Abstract: To combat the memory bandwidth-bound nature of autoregressive LLM inference, previous research has proposed the speculative decoding frame-work. To perform speculative decoding, a small draft model proposes candidate continuations of the input sequence that are then verified in parallel by the base model. One way to specify the draft model, as used in the recent Medusa decoding framework, is as a… ▽ More

    Submitted 7 October, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  8. arXiv:2311.09431  [pdf, other

    cs.LG cs.CL

    Striped Attention: Faster Ring Attention for Causal Transformers

    Authors: William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley

    Abstract: To help address the growing demand for ever-longer sequence lengths in transformer models, Liu et al. recently proposed Ring Attention, an exact attention algorithm capable of overcoming per-device memory bottle- necks by distributing self-attention across multiple devices. In this paper, we study the performance characteristics of Ring Attention in the important special case of causal transformer… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  9. arXiv:2305.15096  [pdf, other

    cs.CL cs.AI

    Dynamic Masking Rate Schedules for MLM Pretraining

    Authors: Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

    Abstract: Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking rate throughout training. We find that linearly decreasing the masking rate over the course of pretraining improves average GLUE accuracy by up to 0.46% and 0.25% in BERT-base and BERT-large, respectivel… ▽ More

    Submitted 10 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  10. arXiv:2212.00291  [pdf, other

    cs.LG

    The Effect of Data Dimensionality on Neural Network Prunability

    Authors: Zachary Ankner, Alex Renda, Gintare Karolina Dziugaite, Jonathan Frankle, Tian Jin

    Abstract: Practitioners prune neural networks for efficiency gains and generalization improvements, but few scrutinize the factors determining the prunability of a neural network the maximum fraction of weights that pruning can remove without compromising the model's test accuracy. In this work, we study the properties of input data that may contribute to the prunability of a neural network. For high dimens… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  11. arXiv:2211.16677  [pdf, other

    cs.CV cs.AI cs.GR

    3D Neural Field Generation using Triplane Diffusion

    Authors: J. Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, Gordon Wetzstein

    Abstract: Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D t… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Project page: https://jryanshue.com/nfd