Skip to main content

Showing 1–27 of 27 results for author: Hofmann, V

.
  1. arXiv:2506.00253  [pdf, ps, other

    cs.CL cs.AI cs.CY

    Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race

    Authors: Lihao Sun, Chengzhi Mao, Valentin Hofmann, Xuechunzi Bai

    Abstract: Although value-aligned language models (LMs) appear unbiased in explicit bias evaluations, they often exhibit stereotypes in implicit word association tasks, raising concerns about their fair usage. We investigate the mechanisms behind this discrepancy and find that alignment surprisingly amplifies implicit bias in model outputs. Specifically, we show that aligned LMs, unlike their unaligned count… ▽ More

    Submitted 8 June, 2025; v1 submitted 30 May, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 (Main)

  2. arXiv:2505.03054  [pdf, other

    cs.AI cs.CL cs.SD eess.AS

    BLAB: Brutally Long Audio Bench

    Authors: Orevaoghene Ahia, Martijn Bartelds, Kabir Ahuja, Hila Gonen, Valentin Hofmann, Siddhant Arora, Shuyue Stella Li, Vishal Puttagunta, Mofetoluwa Adeyemi, Charishma Buchireddy, Ben Walls, Noah Bennett, Shinji Watanabe, Noah A. Smith, Yulia Tsvetkov, Sachin Kumar

    Abstract: Developing large audio language models (LMs) capable of understanding diverse spoken interactions is essential for accommodating the multimodal nature of human communication and can increase the accessibility of language technologies across different user populations. Recent work on audio LMs has primarily evaluated their performance on short audio segments, typically under 30 seconds, with limite… ▽ More

    Submitted 12 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  3. arXiv:2503.13423  [pdf, other

    cs.CL cs.LG

    SuperBPE: Space Travel for Language Models

    Authors: Alisa Liu, Jonathan Hayase, Valentin Hofmann, Sewoong Oh, Noah A. Smith, Yejin Choi

    Abstract: The assumption across nearly all language model (LM) tokenization schemes is that tokens should be subwords, i.e., contained within word boundaries. While providing a seemingly reasonable inductive bias, is this common practice limiting the potential of modern LMs? Whitespace is not a reliable delimiter of meaning, as evidenced by multi-word expressions (e.g., "by the way"), crosslingual variation… ▽ More

    Submitted 14 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: updated related work

  4. arXiv:2502.08395  [pdf, other

    cs.CL

    IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

    Authors: Paul Röttger, Musashi Hinck, Valentin Hofmann, Kobi Hackenburg, Valentina Pyatkin, Faeze Brahman, Dirk Hovy

    Abstract: Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective on a given issue, which in turn may influence how users think about this issue. So far, it has not been possible to measure which issue biases LLMs ac… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: under review

  5. arXiv:2411.07990  [pdf, other

    cs.CL cs.AI cs.LG

    Derivational Morphology Reveals Analogical Generalization in Large Language Models

    Authors: Valentin Hofmann, Leonie Weissweiler, David Mortensen, Hinrich Schütze, Janet Pierrehumbert

    Abstract: What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most studies analyzing the extent to which the language skills of LLMs resemble rules. As of yet, it is not known whether linguistic generalization in LLMs could equally well be explained as the result of analogical processes, which can be formalized as simil… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

  6. arXiv:2410.11005  [pdf, ps, other

    cs.CL cs.LG

    Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

    Authors: Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang, Si-Qing Chen, Michael Wooldridge, Janet B. Pierrehumbert, Furu Wei

    Abstract: Language is not monolithic. While benchmarks, including those designed for multiple languages, are often used as proxies to evaluate the performance of Large Language Models (LLMs), they tend to overlook the nuances of within-language variation and thus fail to model the experience of speakers of non-standard dialects. Focusing on African American Vernacular English (AAVE), we present the first st… ▽ More

    Submitted 9 June, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: ACL 2025 main

  7. arXiv:2407.08818  [pdf

    cs.CL

    MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

    Authors: Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Valentin Hofmann, Tomasz Limisiewicz, Yulia Tsvetkov, Noah A. Smith

    Abstract: In multilingual settings, non-Latin scripts and low-resource languages are usually disadvantaged in terms of language models' utility, efficiency, and cost. Specifically, previous studies have reported multiple modeling biases that the current tokenization algorithms introduce to non-Latin script languages, the main one being over-segmentation. In this work, we propose MAGNET; multilingual adaptiv… ▽ More

    Submitted 16 November, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2403.00742  [pdf, other

    cs.CL cs.AI cs.CY

    Dialect prejudice predicts AI decisions about people's character, employability, and criminality

    Authors: Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, Sharese King

    Abstract: Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists h… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  9. arXiv:2402.16786  [pdf, other

    cs.CL cs.AI

    Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

    Authors: Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy

    Abstract: Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased LLMs may subtly influence society when they are used by millions of people. Such real-world concerns, however, stand in stark contrast to the artificial… ▽ More

    Submitted 5 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (Main Conference)

  10. arXiv:2402.02805  [pdf, other

    cs.AI cs.CL cs.LG

    Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

    Authors: Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, Janet B. Pierrehumbert

    Abstract: Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LL… ▽ More

    Submitted 3 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML-2024

  11. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  12. arXiv:2312.10523  [pdf, other

    cs.CL cs.AI cs.LG

    Paloma: A Benchmark for Evaluating Language Model Fit

    Authors: Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

    Abstract: Evaluations of language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains--varying distributions of language. We introduce Perplexity Analysis for Language Model Assessment (Paloma), a benchmark to measure LM fit to 546 English and code domains, instead of assuming perplexity on one distribution extrapolate… ▽ More

    Submitted 7 December, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: Conference: NeurIPS 2024, Project Page: https://paloma.allen.ai/

  13. arXiv:2310.15113  [pdf

    cs.CL

    Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model

    Authors: Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schütze, Kemal Oflazer, David R. Mortensen

    Abstract: Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (i… ▽ More

    Submitted 26 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  14. arXiv:2308.11456  [pdf

    cs.SD eess.AS

    Deep learning-based denoising streamed from mobile phones improves speech-in-noise understanding for hearing aid users

    Authors: Peter Udo Diehl, Hannes Zilly, Felix Sattler, Yosef Singer, Kevin Kepp, Mark Berry, Henning Hasemann, Marlene Zippel, Müge Kaya, Paul Meyer-Rachner, Annett Pudszuhn, Veit M. Hofmann, Matthias Vormann, Elias Sprengel

    Abstract: The hearing loss of almost half a billion people is commonly treated with hearing aids. However, current hearing aids often do not work well in real-world noisy environments. We present a deep learning based denoising system that runs in real time on iPhone 7 and Samsung Galaxy S10 (25ms algorithmic latency). The denoised audio is streamed to the hearing aid, resulting in a total delay of around 7… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  15. arXiv:2306.03003  [pdf, other

    astro-ph.IM

    KSIM: simulating KIDSpec, a Microwave Kinetic Inductance Detector spectrograph for the optical/NIR

    Authors: V. Benedict Hofmann, Kieran O'Brien

    Abstract: KIDSpec, the Kinetic Inductance Detector Spectrometer, is a proposed optical to near IR Microwave Kinetic Inductance Detector (MKID) spectrograph. MKIDs are superconducting photon counting detectors which are able to resolve the energy of incoming photons and their time of arrival. KIDSpec will use these detectors to separate incoming spectral orders from a grating, thereby not requiring a cross-d… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 17 pages, 13 figures, accepted to RASTI

  16. arXiv:2212.07547  [pdf, other

    cs.CL cs.AI cs.SI

    Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology

    Authors: Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: We propose a fully unsupervised method to detect bias in contextualized embeddings. The method leverages the assortative information latently encoded by social networks and combines orthogonality regularization, structured sparsity learning, and graph neural networks to find the embedding subspace capturing this information. As a concrete example, we focus on the phenomenon of ideological bias: we… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: ICML 2022

  17. arXiv:2210.13181  [pdf, other

    cs.CL

    The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative

    Authors: Leonie Weissweiler, Valentin Hofmann, Abdullatif Köksal, Hinrich Schütze

    Abstract: Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntact… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  18. What could KIDSpec, a new MKID spectrograph, do on the ELT?

    Authors: V. Benedict Hofmann, Kieran O'Brien, Deli Geng

    Abstract: Microwave Kinetic Inductance Detectors (MKIDs) are beginning to become more prominent in astronomical instrumentation, due to their sensitivity, low noise, high pixel count for superconducting detectors, and inherent energy and time resolving capability. The Kinetic Inductance Detector Spectrometer (KIDSpec) will take advantage of these features, KIDSpec is a medium resolution MKID spectrograph fo… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: Presented at SPIE Astronomical Telescopes & Instrumentation 2022. 7 pages, 4 figures

    Journal ref: Proc. SPIE 12184, Ground-based and Airborne Instrumentation for Astronomy IX, 1218419 (29 August 2022)

  19. arXiv:2206.11567  [pdf

    cs.SD eess.AS q-bio.NC

    Restoring speech intelligibility for hearing aid users with deep learning

    Authors: Peter Udo Diehl, Yosef Singer, Hannes Zilly, Uwe Schönfeld, Paul Meyer-Rachner, Mark Berry, Henning Sprekeler, Elias Sprengel, Annett Pudszuhn, Veit M. Hofmann

    Abstract: Almost half a billion people world-wide suffer from disabling hearing loss. While hearing aids can partially compensate for this, a large proportion of users struggle to understand speech in situations with background noise. Here, we present a deep learning-based algorithm that selectively suppresses noise while maintaining speech signals. The algorithm restores speech intelligibility for hearing… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  20. arXiv:2203.10010  [pdf, other

    cs.CL

    CaMEL: Case Marker Extraction without Labels

    Authors: Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich Schütze

    Abstract: We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a s… ▽ More

    Submitted 28 March, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  21. arXiv:2203.08769  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall quant-ph

    Room temperature donor incorporation for quantum devices: arsine on germanium

    Authors: Emily V. S. Hofmann, Taylor J. Z. Stock, Oliver Warschkow, Rebecca Conybeare, Neil J. Curson, Steven R. Schofield

    Abstract: Germanium has emerged as an exceptionally promising material for spintronics and quantum information applications, with significant fundamental advantages over silicon. However, efforts to create atomic-scale devices using donor atoms as qubits have largely focussed on phosphorus in silicon. Positioning phosphorus in silicon with atomic-scale precision requires a thermal incorporation anneal, but… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: 8 pages, 4 figures, plus 2 pages supplementary information and 1 supplementary figure

  22. arXiv:2203.08565  [pdf, other

    cs.CL

    Geographic Adaptation of Pretrained Language Models

    Authors: Valentin Hofmann, Goran Glavaš, Nikola Ljubešić, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: While pretrained language models (PLMs) have been shown to possess a plethora of linguistic knowledge, the existing body of research has largely neglected extralinguistic knowledge, which is generally difficult to obtain by pretraining on text alone. Here, we contribute to closing this gap by examining geolinguistic knowledge, i.e., knowledge about geographic variation in language. We introduce ge… ▽ More

    Submitted 28 January, 2024; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: TACL 2024 (pre-MIT Press publication version)

  23. arXiv:2104.08829  [pdf, other

    cs.CL cs.AI cs.SI

    Modeling Ideological Salience and Framing in Polarized Online Groups with Graph Neural Networks and Structured Sparsity

    Authors: Valentin Hofmann, Xiaowen Dong, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: The increasing polarization of online political discourse calls for computational tools that automatically detect and monitor ideological divides in social media. We introduce a minimally supervised method that leverages the network structure of online discussion forums, specifically Reddit, to detect polarized concepts. We model polarization along the dimensions of salience and framing, drawing u… ▽ More

    Submitted 14 December, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: NAACL 2022 (Findings)

  24. arXiv:2101.00403  [pdf, other

    cs.CL

    Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words

    Authors: Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words? We present the first study investigating this question, taking BERT as the example PLM and focusing on its semantic representations of English derivatives. We show that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else… ▽ More

    Submitted 2 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: ACL 2021

  25. arXiv:2010.12684  [pdf, other

    cs.CL

    Dynamic Contextualized Word Embeddings

    Authors: Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work on contextualized and dynamic word embeddings, we introduce dynamic contextualized word embeddings that represent words as a function of both linguistic and extralinguistic context. Based on a pretrained language… ▽ More

    Submitted 8 June, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: ACL 2021

  26. arXiv:2005.00672  [pdf, other

    cs.CL

    DagoBERT: Generating Derivational Morphology with a Pretrained Language Model

    Authors: Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: Can pretrained language models (PLMs) generate derivationally complex words? We present the first study investigating this question, taking BERT as the example PLM. We examine BERT's derivational capabilities in different settings, ranging from using the unmodified pretrained model to full finetuning. Our best model, DagoBERT (Derivationally and generatively optimized BERT), clearly outperforms th… ▽ More

    Submitted 7 October, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

  27. arXiv:1910.06685  [pdf

    physics.app-ph

    Atomic-Scale Patterning of Arsenic in Silicon by Scanning Tunneling Microscopy

    Authors: Taylor J. Z. Stock, Oliver Warschkow, Procopios C. Constantinou, Juerong Li, Sarah Fearn, Eleanor Crane, Emily V. S. Hofmann, Alexander Kölker, David R. McKenzie, Steven R. Schofield, Neil J. Curson

    Abstract: Over the last two decades, prototype devices for future classical and quantum computing technologies have been fabricated, by using scanning tunneling microscopy and hydrogen resist lithography to position phosphorus atoms in silicon with atomic-scale precision. Despite these successes, phosphine remains the only donor precursor molecule to have been demonstrated as compatible with the hydrogen re… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.