Skip to main content

Showing 1–11 of 11 results for author: Kiela, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2311.11944  [pdf, other

    cs.CL cs.AI cs.CE stat.ML

    FinanceBench: A New Benchmark for Financial Question Answering

    Authors: Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen

    Abstract: FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are intended to be clear-cut and straightforward to answer… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Dataset is available at: https://huggingface.co/datasets/PatronusAI/financebench

  2. arXiv:2105.11447  [pdf, other

    cs.CL cs.LG stat.ML

    True Few-Shot Learning with Language Models

    Authors: Ethan Perez, Douwe Kiela, Kyunghyun Cho

    Abstract: Pretrained language models (LMs) perform well on many tasks even when learning from a few examples, but prior work uses many held-out examples to tune various aspects of learning, such as hyperparameters, training objectives, and natural language templates ("prompts"). Here, we evaluate the few-shot ability of LMs when such held-out examples are unavailable, a setting we call true few-shot learnin… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: Code at https://github.com/ethanjperez/true_few_shot

  3. arXiv:2103.03872  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Rissanen Data Analysis: Examining Dataset Characteristics via Description Length

    Authors: Ethan Perez, Douwe Kiela, Kyunghyun Cho

    Abstract: We introduce a method to determine if a certain capability helps to achieve an accurate model of given data. We view labels as being generated from the inputs by a program composed of subroutines with different capabilities, and we posit that a subroutine is useful if and only if the minimal program that invokes it is shorter than the one that does not. Since minimum program length is uncomputable… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: Code at https://github.com/ethanjperez/rda along with a script to run RDA on your own dataset

  4. arXiv:2009.12789  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Optimal Representations with the Decodable Information Bottleneck

    Authors: Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam

    Abstract: We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked… ▽ More

    Submitted 16 July, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: Accepted at NeurIPS 2020

  5. arXiv:2002.02878  [pdf, other

    cs.AI cs.CL stat.ML

    I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

    Authors: Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam

    Abstract: Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the div… ▽ More

    Submitted 10 February, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  6. arXiv:2002.01093  [pdf, other

    cs.CL cs.AI cs.LG cs.MA stat.ML

    On the interaction between supervision and self-play in emergent communication

    Authors: Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

    Abstract: A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training. However, recent work suggests that current machine learning methods are too data inefficient to be trained in this way from scratch. In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imi… ▽ More

    Submitted 22 June, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: The first two authors contributed equally. Accepted at ICLR 2020

  7. arXiv:1910.12892  [pdf, other

    cs.LG stat.ML

    Hyperbolic Graph Neural Networks

    Authors: Qi Liu, Maximilian Nickel, Douwe Kiela

    Abstract: Learning from graph-structured data is an important task in machine learning and artificial intelligence, for which Graph Neural Networks (GNNs) have shown great promise. Motivated by recent advances in geometric representation learning, we propose a novel GNN architecture for learning representations on Riemannian manifolds with differentiable exponential and logarithmic maps. We develop a scalab… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: Published at NeurIPS 2019

  8. arXiv:1910.01727  [pdf, other

    cs.LG stat.ML

    Generalized Inner Loop Meta-Learning

    Authors: Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, Soumith Chintala

    Abstract: Many (but not all) approaches self-qualifying as "meta-learning" in deep learning and reinforcement learning fit a common pattern of approximating the solution to a nested optimization problem. In this paper, we give a formalization of this shared pattern, which we call GIMLI, prove its general requirements, and derive a general-purpose algorithm for implementing similar approaches. Based on this… ▽ More

    Submitted 7 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 17 pages, 3 figures, 1 algorithm

  9. arXiv:1909.02950  [pdf, other

    cs.CL cs.CV cs.LG stat.ML

    Supervised Multimodal Bitransformers for Classifying Images and Text

    Authors: Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Ethan Perez, Davide Testuggine

    Abstract: Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks. The modern digital world is increasingly multimodal, however, and textual information is often accompanied by other modalities such as images. We introduce a supervised multimodal bitransformer model that fuses information from text and image encoders,… ▽ More

    Submitted 11 November, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

    Comments: Rejected from EMNLP, twice

  10. arXiv:1806.03417  [pdf, other

    cs.AI cs.LG stat.ML

    Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry

    Authors: Maximilian Nickel, Douwe Kiela

    Abstract: We are concerned with the discovery of hierarchical relationships from large-scale unstructured similarity scores. For this purpose, we study different models of hyperbolic space and find that learning embeddings in the Lorentz model is substantially more efficient than in the Poincaré-ball model. We show that the proposed approach allows us to learn high-quality embeddings of large taxonomies whi… ▽ More

    Submitted 8 July, 2018; v1 submitted 9 June, 2018; originally announced June 2018.

    Comments: Accepted at ICML'18

    ACM Class: I.2.0

  11. arXiv:1705.08039  [pdf, other

    cs.AI cs.LG stat.ML

    Poincaré Embeddings for Learning Hierarchical Representations

    Authors: Maximilian Nickel, Douwe Kiela

    Abstract: Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical repre… ▽ More

    Submitted 26 May, 2017; v1 submitted 22 May, 2017; originally announced May 2017.