Skip to main content

Showing 1–8 of 8 results for author: Paster, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.00844  [pdf, other

    cs.LG cs.AI cs.CL

    Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries

    Authors: Blair Yang, Fuyang Cui, Keiran Paster, Jimmy Ba, Pashootan Vaezipoor, Silviu Pitis, Michael R. Zhang

    Abstract: The rapid development and dynamic nature of large language models (LLMs) make it difficult for conventional quantitative benchmarks to accurately assess their capabilities. We propose report cards, which are human-interpretable, natural language summaries of model behavior for specific skills or topics. We develop a framework to evaluate report cards based on three criteria: specificity (ability t… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 11 pages, 8 figures

  2. arXiv:2310.10631  [pdf, other

    cs.CL cs.AI cs.LO

    Llemma: An Open Language Model For Mathematics

    Authors: Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

    Abstract: We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool u… ▽ More

    Submitted 15 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Updated references; corrected description of COPRA search budget

  3. arXiv:2310.06786  [pdf, other

    cs.AI cs.CL cs.LG

    OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

    Authors: Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba

    Abstract: There is growing evidence that pretraining on high quality, carefully thought-out tokens such as code or mathematics plays an important role in improving the reasoning abilities of large language models. For example, Minerva, a PaLM model finetuned on billions of tokens of mathematical documents from arXiv and the web, reported dramatically improved performance on problems that require quantitativ… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  4. arXiv:2306.00937  [pdf, other

    cs.AI cs.LG

    STEVE-1: A Generative Model for Text-to-Behavior in Minecraft

    Authors: Shalev Lifshitz, Keiran Paster, Harris Chan, Jimmy Ba, Sheila McIlraith

    Abstract: Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces a methodology, inspired by unCLIP, for instruction-tuning generative models of behavior without relying on a large dataset of instruction-labeled trajectories. Using this methodology, we create an instruction-tuned Video Pretraining (VPT) model called STEVE… ▽ More

    Submitted 3 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  5. arXiv:2211.01910  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models Are Human-Level Prompt Engineers

    Authors: Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

    Abstract: By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. Inspired by classical program synthesis and the human approach to prompt engineering, we p… ▽ More

    Submitted 10 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

  6. arXiv:2205.15967  [pdf, other

    cs.LG cs.AI

    You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments

    Authors: Keiran Paster, Sheila McIlraith, Jimmy Ba

    Abstract: Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically i… ▽ More

    Submitted 27 November, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: Added experiments with Decision Transformers; Fixed error in Theorem 2.1; Updated related works; Added link for code

  7. arXiv:2110.14248  [pdf, other

    cs.LG cs.AI

    Learning Domain Invariant Representations in Goal-conditioned Block MDPs

    Authors: Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael R. Zhang, Jimmy Ba

    Abstract: Deep Reinforcement Learning (RL) is successful in solving many complex Markov Decision Processes (MDPs) problems. However, agents often face unanticipated environmental changes after deployment in the real world. These changes are often spurious and unrelated to the underlying problem, such as background shifts for visual input agents. Unfortunately, deep RL policies are usually sensitive to these… ▽ More

    Submitted 27 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: 33 pages

    Journal ref: NeurIPS2021

  8. arXiv:2012.02419  [pdf, other

    cs.LG cs.AI

    Planning from Pixels using Inverse Dynamics Models

    Authors: Keiran Paster, Sheila A. McIlraith, Jimmy Ba

    Abstract: Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heur… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: 9 pages, 4 figures