Skip to main content

Showing 1–7 of 7 results for author: Wilkens, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.14925  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels

    Authors: Sil Hamilton, Rebecca M. M. Hicke, Matthew Wilkens, David Mimno

    Abstract: Although the context length of large language models (LLMs) has increased to millions of tokens, evaluating their effectiveness beyond needle-in-a-haystack approaches has proven difficult. We argue that novels provide a case study of subtle, complicated structure and long-range semantic dependencies often over 128k tokens in length. Inspired by work on computational novel analysis, we release the… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2505.00030  [pdf, other

    cs.CL

    Can Language Models Represent the Past without Anachronism?

    Authors: Ted Underwood, Laura K. Nelson, Matthew Wilkens

    Abstract: Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not produce output consistent with period style. Fine-tuning produces results that are stylistically convincing enough to fool an automated judge, but human evaluators can still distinguish fine-tuned model… ▽ More

    Submitted 27 April, 2025; originally announced May 2025.

  3. arXiv:2504.01349  [pdf, ps, other

    cs.CL

    Tasks and Roles in Legal AI: Data Curation, Annotation, and Verification

    Authors: Allison Koenecke, Jed Stiglitz, David Mimno, Matthew Wilkens

    Abstract: The application of AI tools to the legal field feels natural: large legal document collections could be used with specialized AI to improve workflow efficiency for lawyers and ameliorate the "justice gap" for underserved clients. However, legal documents differ from the web-based text that underlies most AI systems. The challenges of legal AI are both specific to the legal domain, and confounded w… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  4. arXiv:2502.19590  [pdf, other

    cs.CL cs.LG cs.SI

    A City of Millions: Mapping Literary Social Networks At Scale

    Authors: Sil Hamilton, Rebecca M. M. Hicke, David Mimno, Matthew Wilkens

    Abstract: We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for $\sim$30,000 of these texts (73\% nonfiction and 27\% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482… ▽ More

    Submitted 28 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  5. arXiv:2401.07340  [pdf

    cs.CL

    The Afterlives of Shakespeare and Company in Online Social Readership

    Authors: Maria Antoniak, David Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney

    Abstract: The growth of social reading platforms such as Goodreads and LibraryThing enables us to analyze reading activity at very large scale and in remarkable detail. But twenty-first century systems give us a perspective only on contemporary readers. Meanwhile, the digitization of the lending library records of Shakespeare and Company provides a window into the reading activity of an earlier, smaller com… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  6. arXiv:2310.18440  [pdf, other

    cs.CL

    Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement

    Authors: Rosamond Thalken, Edward H. Stiglitz, David Mimno, Matthew Wilkens

    Abstract: Generative language models (LMs) are increasingly used for document class-prediction tasks and promise enormous improvements in cost and efficiency. Existing research often examines simple classification tasks, but the capability of LMs to classify on complex or specialized tasks is less well understood. We consider a highly complex task that is challenging even for humans: the classification of l… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Journal ref: Published in EMNLP 2023

  7. arXiv:2305.17561  [pdf, other

    cs.CL

    Grounding Characters and Places in Narrative Texts

    Authors: Sandeep Soni, Amanpreet Sihra, Elizabeth F. Evans, Matthew Wilkens, David Bamman

    Abstract: Tracking characters and locations throughout a story can help improve the understanding of its plot structure. Prior research has analyzed characters and locations from text independently without grounding characters to their locations in narrative time. Here, we address this gap by proposing a new spatial relationship categorization task. The objective of the task is to assign a spatial relations… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: 12 pages, 4 figures, 5 tables; to appear in the proceedings of ACL 2023