Skip to main content

Showing 1–10 of 10 results for author: Kitaev, N

.
  1. arXiv:2010.05315  [pdf, other

    cs.LG

    SMYRF: Efficient Attention using Asymmetric Clustering

    Authors: Giannis Daras, Nikita Kitaev, Augustus Odena, Alexandros G. Dimakis

    Abstract: We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where $N$ is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: 30 pages, 10 figures

  2. arXiv:2010.03146  [pdf, ps, other

    cs.CL cs.LG

    Unsupervised Parsing via Constituency Tests

    Authors: Steven Cao, Nikita Kitaev, Dan Klein

    Abstract: We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. One type of constituency test involves modifying the sentence via some transformation (e.g. replacing the span with a pronoun) and then judging the result (e.g. checking if it is grammatical). Motivated by this idea, we design an unsupervised parser by specifying a set of transformations and using a… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  3. arXiv:2002.03518  [pdf, other

    cs.CL cs.LG

    Multilingual Alignment of Contextual Word Representations

    Authors: Steven Cao, Nikita Kitaev, Dan Klein

    Abstract: We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in analyzing and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model, remarkably matching pseudo-fully-supervised translate-train models for Bulgarian and Gre… ▽ More

    Submitted 12 February, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: ICLR 2020

  4. arXiv:2001.04451  [pdf, other

    cs.LG cs.CL stat.ML

    Reformer: The Efficient Transformer

    Authors: Nikita Kitaev, Ɓukasz Kaiser, Anselm Levskaya

    Abstract: Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is… ▽ More

    Submitted 18 February, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: ICLR 2020

  5. arXiv:1907.04347  [pdf, other

    cs.CL

    Cross-Domain Generalization of Neural Constituency Parsers

    Authors: Daniel Fried, Nikita Kitaev, Dan Klein

    Abstract: Neural parsers obtain state-of-the-art results on benchmark treebanks for constituency parsing -- but to what degree do they generalize to other domains? We present three results about the generalization of neural parsers in a zero-shot setting: training on trees from one corpus and evaluating on out-of-domain corpora. First, neural and non-neural parsers generalize comparably to new domains. Seco… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Comments: ACL 2019. DF and NK contributed equally

  6. arXiv:1906.01604  [pdf, ps, other

    cs.CL cs.LG stat.ML

    KERMIT: Generative Insertion-Based Modeling for Sequences

    Authors: William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit

    Abstract: We present KERMIT, a simple insertion-based approach to generative modeling for sequences and sequence pairs. KERMIT models the joint distribution and its decompositions (i.e., marginals and conditionals) using a single neural network and, unlike much prior work, does not rely on a prespecified factorization of the data distribution. During training, one can feed KERMIT paired data $(x, y)$ to lea… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: William Chan, Nikita Kitaev, Kelvin Guu, and Mitchell Stern contributed equally

  7. arXiv:1904.09745  [pdf, other

    cs.CL

    Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference

    Authors: Nikita Kitaev, Dan Klein

    Abstract: We present a constituency parsing algorithm that, like a supertagger, works by assigning labels to each word in a sentence. In order to maximally leverage current neural architectures, the model scores each word's tags in parallel, with minimal task-specific structure. After scoring, a left-to-right reconciliation phase extracts a tree in (empirically) linear time. Our parser achieves 95.4 F1 on t… ▽ More

    Submitted 28 June, 2020; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: ACL 2020

  8. arXiv:1812.11760  [pdf, other

    cs.CL

    Multilingual Constituency Parsing with Self-Attention and Pre-Training

    Authors: Nikita Kitaev, Steven Cao, Dan Klein

    Abstract: We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find… ▽ More

    Submitted 4 June, 2019; v1 submitted 31 December, 2018; originally announced December 2018.

    Comments: ACL 2019

  9. arXiv:1805.01052  [pdf, other

    cs.CL

    Constituency Parsing with a Self-Attentive Encoder

    Authors: Nikita Kitaev, Dan Klein

    Abstract: We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separati… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  10. arXiv:1712.05558  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication

    Authors: Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh

    Abstract: In this work, we propose a goal-driven collaborative task that combines language, perception, and action. Specifically, we develop a Collaborative image-Drawing game between two agents, called CoDraw. Our game is grounded in a virtual world that contains movable clip art objects. The game involves two players: a Teller and a Drawer. The Teller sees an abstract scene containing multiple clip art pi… ▽ More

    Submitted 4 June, 2019; v1 submitted 15 December, 2017; originally announced December 2017.

    Comments: ACL 2019