Skip to main content

Showing 1–10 of 10 results for author: Hicke, R M M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.14925  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels

    Authors: Sil Hamilton, Rebecca M. M. Hicke, Matthew Wilkens, David Mimno

    Abstract: Although the context length of large language models (LLMs) has increased to millions of tokens, evaluating their effectiveness beyond needle-in-a-haystack approaches has proven difficult. We argue that novels provide a case study of subtle, complicated structure and long-range semantic dependencies often over 128k tokens in length. Inspired by work on computational novel analysis, we release the… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2504.06393  [pdf, other

    cs.CL cs.LG

    The Zero Body Problem: Probing LLM Use of Sensory Language

    Authors: Rebecca M. M. Hicke, Sil Hamilton, David Mimno

    Abstract: Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. This language is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. We extend an existing corpus of… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  3. arXiv:2502.19590  [pdf, other

    cs.CL cs.LG cs.SI

    A City of Millions: Mapping Literary Social Networks At Scale

    Authors: Sil Hamilton, Rebecca M. M. Hicke, David Mimno, Matthew Wilkens

    Abstract: We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for $\sim$30,000 of these texts (73\% nonfiction and 27\% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482… ▽ More

    Submitted 28 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  4. arXiv:2502.03647  [pdf, other

    cs.CL cs.LG

    Looking for the Inner Music: Probing LLMs' Understanding of Literary Style

    Authors: Rebecca M. M. Hicke, David Mimno

    Abstract: Recent work has demonstrated that language models can be trained to identify the author of much shorter literary passages than has been thought feasible for traditional stylometry. We replicate these results for authorship and extend them to a new dataset measuring novel genre. We find that LLMs are able to distinguish authorship and genre, but they do so in different ways. Some models seem to rel… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  5. arXiv:2410.12791  [pdf, other

    cs.CL

    Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media

    Authors: Ross Deans Kristensen-McLachlan, Rebecca M. M. Hicke, Márton Kardos, Mette Thunø

    Abstract: Does the People's Republic of China (PRC) interfere with European elections through ethnic Chinese diaspora media? This question forms the basis of an ongoing research project exploring how PRC narratives about European elections are represented in Chinese diaspora media, and thus the objectives of PRC news media manipulation. In order to study diaspora media efficiently and at scale, it is necess… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Accepted to the 2024 Computational Humanities Research Conference (CHR)

  6. arXiv:2410.08991  [pdf, other

    cs.CL cs.LG

    Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory

    Authors: Rebecca M. M. Hicke, Ross Deans Kristensen-McLachlan

    Abstract: Metaphors are everywhere. They appear extensively across all domains of natural language, from the most sophisticated poetry to seemingly dry academic prose. A significant body of research in the cognitive science of language argues for the existence of conceptual metaphors, the systematic structuring of one domain of experience in the language of another. Conceptual metaphors are not simply rheto… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: Accepted to the 2024 Computational Humanities Research Conference (CHR)

  7. arXiv:2409.11390  [pdf, other

    cs.CL cs.LG

    Says Who? Effective Zero-Shot Annotation of Focalization

    Authors: Rebecca M. M. Hicke, Yuri Bizzoni, Pascale Feldkamp, Ross Deans Kristensen-McLachlan

    Abstract: Focalization, the perspective through which narrative is presented, is encoded via a wide range of lexico-grammatical features and is subject to reader interpretation. Even trained annotators frequently disagree on correct labels, suggesting this task is both qualitatively and computationally challenging. In this work, we test how well five contemporary large language model (LLM) families and two… ▽ More

    Submitted 28 March, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

  8. arXiv:2401.17922  [pdf, other

    cs.CL

    [Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs

    Authors: Rebecca M. M. Hicke, David Mimno

    Abstract: Coreference annotation and resolution is a vital component of computational literary studies. However, it has previously been difficult to build high quality systems for fiction. Coreference requires complicated structured outputs, and literary text involves subtle inferences and highly varied language. New language-model-based seq2seq systems present the opportunity to solve both these problems b… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted to LaTeCH-CLfL 2024

  9. arXiv:2310.18454  [pdf, other

    cs.CL cs.LG

    T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models

    Authors: Rebecca M. M. Hicke, David Mimno

    Abstract: Large language models have shown breakthrough potential in many NLP domains. Here we consider their use for stylometry, specifically authorship identification in Early Modern English drama. We find both promising and concerning results; LLMs are able to accurately predict the author of surprisingly short passages but are also prone to confidently misattribute texts to specific authors. A fine-tune… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Published in CHR 2023

  10. arXiv:2210.08059  [pdf, other

    cs.HC

    Word Clouds in the Wild

    Authors: Rebecca M. M. Hicke, Maanya Goenka, Eric Alexander

    Abstract: Word clouds are frequently used to analyze and communicate text data in many domains. In order to help guide research on improving the legibility of word clouds, we have conducted a survey of their usage in Digital Humanities academia and journalism. Using a modified grounded theory approach, we sought to identify the most common purposes for which word clouds were employed and the most common vis… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.