-
Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels
Authors:
Sil Hamilton,
Rebecca M. M. Hicke,
Matthew Wilkens,
David Mimno
Abstract:
Although the context length of large language models (LLMs) has increased to millions of tokens, evaluating their effectiveness beyond needle-in-a-haystack approaches has proven difficult. We argue that novels provide a case study of subtle, complicated structure and long-range semantic dependencies often over 128k tokens in length. Inspired by work on computational novel analysis, we release the…
▽ More
Although the context length of large language models (LLMs) has increased to millions of tokens, evaluating their effectiveness beyond needle-in-a-haystack approaches has proven difficult. We argue that novels provide a case study of subtle, complicated structure and long-range semantic dependencies often over 128k tokens in length. Inspired by work on computational novel analysis, we release the Too Long, Didn't Model (TLDM) benchmark, which tests a model's ability to report plot summary, storyworld configuration, and elapsed narrative time. We find that none of seven tested frontier LLMs retain stable understanding beyond 64k tokens. Our results suggest language model developers must look beyond "lost in the middle" benchmarks when evaluating model performance in complex long-context scenarios. To aid in further development we release the TLDM benchmark together with reference code and data.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
The Zero Body Problem: Probing LLM Use of Sensory Language
Authors:
Rebecca M. M. Hicke,
Sil Hamilton,
David Mimno
Abstract:
Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. This language is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. We extend an existing corpus of…
▽ More
Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. This language is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. We extend an existing corpus of parallel human and model responses to short story prompts with an additional 18,000 stories generated by 18 popular models. We find that all models generate stories that differ significantly from human usage of sensory language, but the direction of these differences varies considerably between model families. Namely, Gemini models use significantly more sensory language than humans along most axes whereas most models from the remaining five families use significantly less. Linear probes run on five models suggest that they are capable of identifying sensory language. However, we find preliminary evidence suggesting that instruction tuning may discourage usage of sensory language. Finally, to support further work, we release our expanded story dataset.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
A City of Millions: Mapping Literary Social Networks At Scale
Authors:
Sil Hamilton,
Rebecca M. M. Hicke,
David Mimno,
Matthew Wilkens
Abstract:
We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for $\sim$30,000 of these texts (73\% nonfiction and 27\% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482…
▽ More
We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for $\sim$30,000 of these texts (73\% nonfiction and 27\% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482 pair-wise relationships annotated for affinity and relationship type. We achieve this scale by automating previously manual methods of extracting social networks; specifically, we adapt an existing annotation task as a language model prompt, ensuring consistency at scale with the use of structured output. This dataset serves as a unique resource for humanities and social science research by providing data on cognitive models of social realities.
△ Less
Submitted 28 March, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Looking for the Inner Music: Probing LLMs' Understanding of Literary Style
Authors:
Rebecca M. M. Hicke,
David Mimno
Abstract:
Recent work has demonstrated that language models can be trained to identify the author of much shorter literary passages than has been thought feasible for traditional stylometry. We replicate these results for authorship and extend them to a new dataset measuring novel genre. We find that LLMs are able to distinguish authorship and genre, but they do so in different ways. Some models seem to rel…
▽ More
Recent work has demonstrated that language models can be trained to identify the author of much shorter literary passages than has been thought feasible for traditional stylometry. We replicate these results for authorship and extend them to a new dataset measuring novel genre. We find that LLMs are able to distinguish authorship and genre, but they do so in different ways. Some models seem to rely more on memorization, while others benefit more from training to learn author/genre characteristics. We then use three methods to probe one high-performing LLM for features that define style. These include direct syntactic ablations to input text as well as two methods that look at model internals. We find that authorial style is easier to define than genre-level style and is more impacted by minor syntactic decisions and contextual word usage. However, some traits like pronoun usage and word order prove significant for defining both kinds of literary style.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media
Authors:
Ross Deans Kristensen-McLachlan,
Rebecca M. M. Hicke,
Márton Kardos,
Mette Thunø
Abstract:
Does the People's Republic of China (PRC) interfere with European elections through ethnic Chinese diaspora media? This question forms the basis of an ongoing research project exploring how PRC narratives about European elections are represented in Chinese diaspora media, and thus the objectives of PRC news media manipulation. In order to study diaspora media efficiently and at scale, it is necess…
▽ More
Does the People's Republic of China (PRC) interfere with European elections through ethnic Chinese diaspora media? This question forms the basis of an ongoing research project exploring how PRC narratives about European elections are represented in Chinese diaspora media, and thus the objectives of PRC news media manipulation. In order to study diaspora media efficiently and at scale, it is necessary to use techniques derived from quantitative text analysis, such as topic modelling. In this paper, we present a pipeline for studying information dynamics in Chinese media. Firstly, we present KeyNMF, a new approach to static and dynamic topic modelling using transformer-based contextual embedding models. We provide benchmark evaluations to demonstrate that our approach is competitive on a number of Chinese datasets and metrics. Secondly, we integrate KeyNMF with existing methods for describing information dynamics in complex systems. We apply this pipeline to data from five news sites, focusing on the period of time leading up to the 2024 European parliamentary elections. Our methods and results demonstrate the effectiveness of KeyNMF for studying information dynamics in Chinese media and lay groundwork for further work addressing the broader research questions.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory
Authors:
Rebecca M. M. Hicke,
Ross Deans Kristensen-McLachlan
Abstract:
Metaphors are everywhere. They appear extensively across all domains of natural language, from the most sophisticated poetry to seemingly dry academic prose. A significant body of research in the cognitive science of language argues for the existence of conceptual metaphors, the systematic structuring of one domain of experience in the language of another. Conceptual metaphors are not simply rheto…
▽ More
Metaphors are everywhere. They appear extensively across all domains of natural language, from the most sophisticated poetry to seemingly dry academic prose. A significant body of research in the cognitive science of language argues for the existence of conceptual metaphors, the systematic structuring of one domain of experience in the language of another. Conceptual metaphors are not simply rhetorical flourishes but are crucial evidence of the role of analogical reasoning in human cognition. In this paper, we ask whether Large Language Models (LLMs) can accurately identify and explain the presence of such conceptual metaphors in natural language data. Using a novel prompting technique based on metaphor annotation guidelines, we demonstrate that LLMs are a promising tool for large-scale computational research on conceptual metaphors. Further, we show that LLMs are able to apply procedural guidelines designed for human annotators, displaying a surprising depth of linguistic knowledge.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Says Who? Effective Zero-Shot Annotation of Focalization
Authors:
Rebecca M. M. Hicke,
Yuri Bizzoni,
Pascale Feldkamp,
Ross Deans Kristensen-McLachlan
Abstract:
Focalization, the perspective through which narrative is presented, is encoded via a wide range of lexico-grammatical features and is subject to reader interpretation. Even trained annotators frequently disagree on correct labels, suggesting this task is both qualitatively and computationally challenging. In this work, we test how well five contemporary large language model (LLM) families and two…
▽ More
Focalization, the perspective through which narrative is presented, is encoded via a wide range of lexico-grammatical features and is subject to reader interpretation. Even trained annotators frequently disagree on correct labels, suggesting this task is both qualitatively and computationally challenging. In this work, we test how well five contemporary large language model (LLM) families and two baselines perform when annotating short literary excerpts for focalization. Despite the challenging nature of the task, we find that LLMs show comparable performance to trained human annotators, with GPT-4o achieving an average F1 of 84.79%. Further, we demonstrate that the log probabilities output by GPT-family models frequently reflect the difficulty of annotating particular excerpts. Finally, we provide a case study analyzing sixteen Stephen King novels, demonstrating the usefulness of this approach for computational literary studies and the insights gleaned from examining focalization at scale.
△ Less
Submitted 28 March, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
[Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs
Authors:
Rebecca M. M. Hicke,
David Mimno
Abstract:
Coreference annotation and resolution is a vital component of computational literary studies. However, it has previously been difficult to build high quality systems for fiction. Coreference requires complicated structured outputs, and literary text involves subtle inferences and highly varied language. New language-model-based seq2seq systems present the opportunity to solve both these problems b…
▽ More
Coreference annotation and resolution is a vital component of computational literary studies. However, it has previously been difficult to build high quality systems for fiction. Coreference requires complicated structured outputs, and literary text involves subtle inferences and highly varied language. New language-model-based seq2seq systems present the opportunity to solve both these problems by learning to directly generate a copy of an input sentence with markdown-like annotations. We create, evaluate, and release several trained models for coreference, as well as a workflow for training new models.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models
Authors:
Rebecca M. M. Hicke,
David Mimno
Abstract:
Large language models have shown breakthrough potential in many NLP domains. Here we consider their use for stylometry, specifically authorship identification in Early Modern English drama. We find both promising and concerning results; LLMs are able to accurately predict the author of surprisingly short passages but are also prone to confidently misattribute texts to specific authors. A fine-tune…
▽ More
Large language models have shown breakthrough potential in many NLP domains. Here we consider their use for stylometry, specifically authorship identification in Early Modern English drama. We find both promising and concerning results; LLMs are able to accurately predict the author of surprisingly short passages but are also prone to confidently misattribute texts to specific authors. A fine-tuned t5-large model outperforms all tested baselines, including logistic regression, SVM with a linear kernel, and cosine delta, at attributing small passages. However, we see indications that the presence of certain authors in the model's pre-training data affects predictive results in ways that are difficult to assess.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Word Clouds in the Wild
Authors:
Rebecca M. M. Hicke,
Maanya Goenka,
Eric Alexander
Abstract:
Word clouds are frequently used to analyze and communicate text data in many domains. In order to help guide research on improving the legibility of word clouds, we have conducted a survey of their usage in Digital Humanities academia and journalism. Using a modified grounded theory approach, we sought to identify the most common purposes for which word clouds were employed and the most common vis…
▽ More
Word clouds are frequently used to analyze and communicate text data in many domains. In order to help guide research on improving the legibility of word clouds, we have conducted a survey of their usage in Digital Humanities academia and journalism. Using a modified grounded theory approach, we sought to identify the most common purposes for which word clouds were employed and the most common visual encodings they contained. Our findings indicate that font size, color, and word placement dominate as the primary data-encoding channels, as we hypothesized. Perhaps more surprisingly, we found that asking viewers to perform analytical tasks with word clouds was relatively common, especially in DH sources. This suggests that research into the interactions of these visual encoding channels (particularly in regards to legibility) is warranted.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.