Skip to main content

Showing 1–8 of 8 results for author: Berlot-Attwell, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.03048  [pdf, other

    cs.LG cs.CL

    LLM Library Learning Fails: A LEGO-Prover Case Study

    Authors: Ian Berlot-Attwell, Frank Rudzicz, Xujie Si

    Abstract: Recent advancements in the coding, reasoning, and tool-using abilities of LLMs have spurred interest in library learning (i.e., online learning through the creation, storage, and retrieval of reusable and composable functions, knowledge, checklists, or lemmas). Such systems often promise improved task performance through the automatic creation of broadly applicable tools, as well as superior compu… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 24 pages, 5 figures

  2. arXiv:2410.20274  [pdf, other

    cs.LG cs.CL cs.SC

    Library Learning Doesn't: The Curious Case of the Single-Use "Library"

    Authors: Ian Berlot-Attwell, Frank Rudzicz, Xujie Si

    Abstract: Advances in Large Language Models (LLMs) have spurred a wave of LLM library learning systems for mathematical reasoning. These systems aim to learn a reusable library of tools, such as formal Isabelle lemmas or Python programs that are tailored to a family of tasks. Many of these systems are inspired by the human structuring of knowledge into reusable and extendable concepts, but do current method… ▽ More

    Submitted 26 October, 2024; originally announced October 2024.

    Comments: 24 pages, 7 figures. Accepted to the 4th MATH-AI Workshop at NeurIPS'24

  3. arXiv:2311.08695  [pdf, other

    cs.LG cs.CL cs.CV

    Attribute Diversity Determines the Systematicity Gap in VQA

    Authors: Ian Berlot-Attwell, Kumar Krishna Agrawal, A. Michael Carrell, Yash Sharma, Naomi Saphra

    Abstract: Although modern neural networks often generalize to new combinations of familiar concepts, the conditions that enable such compositionality have long been an open question. In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. To test, we introduce a novel diagnostic d… ▽ More

    Submitted 4 October, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 36 pages, 27 figures, EMNLP 2024

  4. Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric

    Authors: Ian Berlot-Attwell, Frank Rudzicz

    Abstract: In this work, we evaluate various existing dialogue relevance metrics, find strong dependency on the dataset, often with poor correlation with human scores of relevance, and propose modifications to reduce data requirements and domain sensitivity while improving correlation. Our proposed metric achieves state-of-the-art performance on the HUMOD dataset while reducing measured sensitivity to datase… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

    Comments: 18 pages, 7 figures

    Journal ref: In Proceedings of the 4th Workshop on NLP for Conversational AI, pages 166-183, Dublin, Ireland. Association for Computational Linguistics. May 2022

  5. arXiv:2112.02721  [pdf, other

    cs.CL cs.AI cs.LG

    NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

    Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

    Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More

    Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

  6. arXiv:2104.06365  [pdf, other

    cs.LG cs.CV

    Neuro-Symbolic VQA: A review from the perspective of AGI desiderata

    Authors: Ian Berlot-Attwell

    Abstract: An ultimate goal of the AI and ML fields is artificial general intelligence (AGI); although such systems remain science fiction, various models exhibit aspects of AGI. In this work, we look at neuro-symbolic (NS)approaches to visual question answering (VQA) from the perspective of AGI desiderata. We see how well these systems meet these desiderata, and how the desiderata often pull the scientist i… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  7. arXiv:2104.06335  [pdf, other

    cs.CL cs.LG

    On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems

    Authors: Ian Berlot-Attwell, Frank Rudzicz

    Abstract: Automatically evaluating text-based, non-task-oriented dialogue systems (i.e., `chatbots') remains an open problem. Previous approaches have suffered challenges ranging from poor correlation with human judgment to poor generalization and have often required a gold standard reference for comparison or human-annotated data. Extending existing evaluation methods, we propose that a metric based on lin… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  8. arXiv:2011.09625  [pdf, other

    cs.CL cs.AI

    Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

    Authors: John Chen, Ian Berlot-Attwell, Safwan Hossain, Xindi Wang, Frank Rudzicz

    Abstract: Clinical machine learning is increasingly multimodal, collected in both structured tabular formats and unstructured forms such as freetext. We propose a novel task of exploring fairness on a multimodal clinical dataset, adopting equalized odds for the downstream medical prediction tasks. To this end, we investigate a modality-agnostic fairness algorithm - equalized odds post processing - and compa… ▽ More

    Submitted 10 June, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: Best paper award at 3rd Clinical Natural Language Processing Workshop at EMNLP 2020

    Journal ref: Proceedings of the 3rd Clinical Natural Language Processing Workshop (2020), pages 301--312