Skip to main content

Showing 1–2 of 2 results for author: Ravula, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2007.14062  [pdf, other

    cs.LG cs.CL stat.ML

    Big Bird: Transformers for Longer Sequences

    Authors: Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

    Abstract: Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that… ▽ More

    Submitted 8 January, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Journal ref: Neural Information Processing Systems (NeurIPS) 2020

  2. arXiv:2004.08483  [pdf, other

    cs.LG stat.ML

    ETC: Encoding Long and Structured Inputs in Transformers

    Authors: Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang

    Abstract: Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of standard Transformer architectures, namely scaling input length and encoding structured inputs. To scale attention to longer inputs, we introduce a novel global-… ▽ More

    Submitted 27 October, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: Accepted at EMNLP 2020