Skip to main content

Showing 1–12 of 12 results for author: Hoyle, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.14748  [pdf, ps, other

    cs.CL

    Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of Topic Models

    Authors: Zongxia Li, Lorena Calvo-Bartolomé, Alexander Hoyle, Paiheng Xu, Alden Dima, Juan Francisco Fung, Jordan Boyd-Graber

    Abstract: A common use of NLP is to facilitate the understanding of large document collections, with a shift from using traditional topic models to Large Language Models. Yet the effectiveness of using LLM for large corpus understanding in real-world applications remains under-explored. This study measures the knowledge users acquire with unsupervised, supervised LLM-based exploratory approaches or traditio… ▽ More

    Submitted 4 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 22 Pages. LLM for Data Exploration and content analysis, Topic Models. 63rd Annual Meeting of the Association for Computational Linguistics (2025)

  2. arXiv:2406.15352  [pdf, other

    cs.CL

    A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

    Authors: Nishant Balepur, Matthew Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber

    Abstract: Keyword mnemonics are memorable explanations that link new terms to simpler keywords. Prior work generates mnemonics for students, but they do not train models using mnemonics students prefer and aid learning. We build SMART, a mnemonic generator trained on feedback from real students learning new terms. To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics. We then… ▽ More

    Submitted 4 October, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: EMNLP 2024

  3. arXiv:2406.06608  [pdf, other

    cs.CL cs.AI

    The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

    Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering. Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding… ▽ More

    Submitted 26 February, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2311.01449  [pdf, other

    cs.CL

    TopicGPT: A Prompt-based Topic Modeling Framework

    Authors: Chau Minh Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, Mohit Iyyer

    Abstract: Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large lan… ▽ More

    Submitted 1 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024 (Main conference)

  5. arXiv:2305.14583  [pdf, other

    cs.CL

    Natural Language Decompositions of Implicit Content Enable Better Text Representations

    Authors: Alexander Hoyle, Rupak Sarkar, Pranav Goel, Philip Resnik

    Abstract: When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibilit… ▽ More

    Submitted 24 February, 2025; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 (Main conference)

  6. arXiv:2305.12152  [pdf, other

    cs.CL

    Revisiting Automated Topic Model Evaluation with Large Language Models

    Authors: Dominik Stammbach, Vilém Zouhar, Alexander Hoyle, Mrinmaya Sachan, Elliott Ash

    Abstract: Topic models are used to make sense of large text collections. However, automatically evaluating topic model output and determining the optimal number of topics both have been longstanding challenges, with no effective automated solutions to date. This paper proposes using large language models to evaluate such output. We find that large language models appropriately assess the resulting topics, c… ▽ More

    Submitted 22 October, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Journal ref: Forthcoming in EMNLP 2023

  7. arXiv:2210.16162  [pdf, other

    cs.CL cs.HC

    Are Neural Topic Models Broken?

    Authors: Alexander Hoyle, Pranav Goel, Rupak Sarkar, Philip Resnik

    Abstract: Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use. Motivated by conten… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted to Findings of EMNLP 2022

  8. arXiv:2107.02173  [pdf, other

    cs.CL cs.LG

    Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

    Authors: Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, Philip Resnik

    Abstract: Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap:… ▽ More

    Submitted 27 October, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: Accepted to NeurIPS 2021 (spotlight presentation). CR version

  9. arXiv:2012.15793  [pdf, other

    cs.CL

    Promoting Graph Awareness in Linearized Graph-to-Text Generation

    Authors: Alexander Hoyle, Ana Marasović, Noah Smith

    Abstract: Generating text from structured inputs, such as meaning representations or RDF triples, has often involved the use of specialized graph-encoding neural networks. However, recent applications of pretrained transformers to linearizations of graph inputs have yielded state-of-the-art generation results on graph-to-text tasks. Here, we explore the ability of these linearized models to encode local gra… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

  10. arXiv:2010.02377  [pdf, other

    cs.CL cs.IR cs.LG

    Improving Neural Topic Models using Knowledge Distillation

    Authors: Alexander Hoyle, Pranav Goel, Philip Resnik

    Abstract: Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate ar… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted to EMNLP 2020

  11. arXiv:1906.04760  [pdf, other

    cs.CL

    Unsupervised Discovery of Gendered Language through Latent-Variable Modeling

    Authors: Alexander Hoyle, Wolf-Sonkin, Hanna Wallach, Isabelle Augenstein, Ryan Cotterell

    Abstract: Studying the ways in which language is gendered has long been an area of interest in sociolinguistics. Studies have explored, for example, the speech of male and female characters in film and the language used to describe male and female politicians. In this paper, we aim not to merely study this phenomenon qualitatively, but instead to quantify the degree to which the language used to describe me… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: To appear in ACL 2019

  12. arXiv:1904.02839  [pdf, other

    cs.CL cs.LG

    Combining Sentiment Lexica with a Multi-View Variational Autoencoder

    Authors: Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein

    Abstract: When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning polarities to words in a sentiment lexicon, annotators may use binary, categorical, or continuous labels. Naturally, it is of interest to unify these labels from disparate scales to both achieve maximal coverage over words and to create a single, more robust sentimen… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: To appear in NAACL-HLT 2019