Skip to main content

Showing 1–21 of 21 results for author: Mozes, M

.
  1. arXiv:2505.15795  [pdf, ps, other

    cs.CL

    Reverse Engineering Human Preferences with Reinforcement Learning

    Authors: Lisa Alazraki, Tan Yi-Chern, Jon Ander Campos, Maximilian Mozes, Marek Rei, Max Bartolo

    Abstract: The capabilities of Large Language Models (LLMs) are routinely evaluated by other LLMs trained to predict human preferences. This framework--known as LLM-as-a-judge--is highly scalable and relatively low cost. However, it is also vulnerable to malicious exploitation, as LLM responses can be tuned to overfit the preferences of the judge. Previous work shows that the answers generated by a candidate… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  2. arXiv:2504.00698  [pdf

    cs.CL cs.AI cs.LG

    Command A: An Enterprise-Ready Large Language Model

    Authors: Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Milad Alizadeh, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Raphaël Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Björn Bebensee, Neeral Beladia, Walter Beller-Morales, Alexandre Bérard, Andrew Berneshawi, Anna Bialas, Phil Blunsom , et al. (205 additional authors not shown)

    Abstract: In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera… ▽ More

    Submitted 14 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 55 pages

  3. arXiv:2502.08550  [pdf, other

    cs.CL cs.AI

    No Need for Explanations: LLMs can implicitly learn from mistakes in-context

    Authors: Lisa Alazraki, Maximilian Mozes, Jon Ander Campos, Tan Yi-Chern, Marek Rei, Max Bartolo

    Abstract: Showing incorrect answers to Large Language Models (LLMs) is a popular strategy to improve their performance in reasoning-intensive tasks. It is widely assumed that, in order to be helpful, the incorrect answers must be accompanied by comprehensive rationales, explicitly detailing where the mistakes are and how to correct them. However, in this work we present a counterintuitive finding: we observ… ▽ More

    Submitted 21 May, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  4. arXiv:2411.12580  [pdf, other

    cs.CL cs.LG

    Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

    Authors: Laura Ruis, Maximilian Mozes, Juhan Bae, Siddhartha Rao Kamalakara, Dwarak Talupuru, Acyr Locatelli, Robert Kirk, Tim Rocktäschel, Edward Grefenstette, Max Bartolo

    Abstract: The capabilities and limitations of Large Language Models have been sketched out in great detail in recent years, providing an intriguing yet conflicting picture. On the one hand, LLMs demonstrate a general ability to solve problems. On the other hand, they show surprising reasoning gaps when compared to humans, casting doubt on the robustness of their generalisation strategies. The sheer volume o… ▽ More

    Submitted 6 March, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: Published at ICLR 2025

  5. arXiv:2402.19334  [pdf, other

    cs.CL

    Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge

    Authors: Ansh Arora, Xuanli He, Maximilian Mozes, Srinibas Swain, Mark Dras, Qiongkai Xu

    Abstract: The democratization of pre-trained language models through open-source initiatives has rapidly advanced innovation and expanded access to cutting-edge technologies. However, this openness also brings significant security risks, including backdoor attacks, where hidden malicious behaviors are triggered by specific inputs, compromising natural language processing (NLP) system integrity and reliabili… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: accepted to ACL2024 (Findings)

  6. arXiv:2308.12833  [pdf, other

    cs.CL cs.CR

    Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

    Authors: Maximilian Mozes, Xuanli He, Bennett Kleinberg, Lewis D. Griffin

    Abstract: Spurred by the recent rapid increase in the development and distribution of large language models (LLMs) across industry and academia, much recent work has drawn attention to safety- and security-related threats and vulnerabilities of LLMs, including in the context of potentially criminal activities. Specifically, it has been shown that LLMs can be misused for fraud, impersonation, and the generat… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Pre-print

  7. arXiv:2307.10169  [pdf, other

    cs.CL cs.AI cs.LG

    Challenges and Applications of Large Language Models

    Authors: Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy

    Abstract: Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 72 pages. v01. Work in progress. Feedback and comments are highly appreciated!

  8. arXiv:2303.06074  [pdf

    cs.CL

    Susceptibility to Influence of Large Language Models

    Authors: Lewis D Griffin, Bennett Kleinberg, Maximilian Mozes, Kimberly T Mai, Maria Vau, Matthew Caldwell, Augustine Marvor-Parker

    Abstract: Two studies tested the hypothesis that a Large Language Model (LLM) can be used to model psychological change following exposure to influential input. The first study tested a generic mode of influence - the Illusory Truth Effect (ITE) - where earlier exposure to a statement (through, for example, rating its interest) boosts a later truthfulness test rating. Data was collected from 1000 human part… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 24 pages, 6 figures, 7 tables, 53 references

    ACM Class: J.4; I.2.m; I.2.7

  9. arXiv:2302.06598  [pdf, other

    cs.CL

    Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning

    Authors: Maximilian Mozes, Tolga Bolukbasi, Ann Yuan, Frederick Liu, Nithum Thain, Lucas Dixon

    Abstract: Pretrained large language models (LLMs) are able to solve a wide variety of tasks through transfer learning. Various explainability methods have been developed to investigate their decision making process. TracIn (Pruthi et al., 2020) is one such gradient-based method which explains model inferences based on the influence of training examples. In this paper, we explore the use of TracIn to improve… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: Pre-print

  10. arXiv:2302.06541  [pdf, other

    cs.CL

    Towards Agile Text Classifiers for Everyone

    Authors: Maximilian Mozes, Jessica Hoffmann, Katrin Tomanek, Muhamed Kouate, Nithum Thain, Ann Yuan, Tolga Bolukbasi, Lucas Dixon

    Abstract: Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies require different classifiers, and safety policies themselves improve from iteration and adaptation. This paper introduces and evaluates methods for agile text cla… ▽ More

    Submitted 21 October, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Findings of EMNLP 2023

  11. arXiv:2210.11598  [pdf, other

    cs.CL

    Identifying Human Strategies for Generating Word-Level Adversarial Examples

    Authors: Maximilian Mozes, Bennett Kleinberg, Lewis D. Griffin

    Abstract: Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans wer… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  12. arXiv:2208.13081  [pdf, other

    cs.CL cs.CY

    Textwash -- automated open-source text anonymisation

    Authors: Bennett Kleinberg, Toby Davies, Maximilian Mozes

    Abstract: The increased use of text data in social science research has benefited from easy-to-access data (e.g., Twitter). That trend comes at the cost of research requiring sensitive but hard-to-share data (e.g., interview data, police reports, electronic health records). We introduce a solution to that stalemate with the open-source text anonymisation software_Textwash_. This paper presents the empirical… ▽ More

    Submitted 27 August, 2022; originally announced August 2022.

  13. arXiv:2109.11398  [pdf, other

    cs.CV

    Scene Graph Generation for Better Image Captioning?

    Authors: Maximilian Mozes, Martin Schmitt, Vladimir Golkov, Hinrich Schütze, Daniel Cremers

    Abstract: We investigate the incorporation of visual relationships into the task of supervised image caption generation by proposing a model that leverages detected objects and auto-generated visual relationships to describe images in natural language. To do so, we first generate a scene graph from raw image pixels by identifying individual objects and visual relationships between them. This scene graph the… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: Technical report. This work was done and the paper was written in 2019

  14. arXiv:2109.04385  [pdf, other

    cs.CL

    Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

    Authors: Maximilian Mozes, Max Bartolo, Pontus Stenetorp, Bennett Kleinberg, Lewis D. Griffin

    Abstract: Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of wheth… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  15. arXiv:2107.03466  [pdf

    cs.CL cs.SI

    A repeated-measures study on emotional responses after a year in the pandemic

    Authors: Maximilian Mozes, Isabelle van der Vegt, Bennett Kleinberg

    Abstract: The introduction of COVID-19 lockdown measures and an outlook on return to normality are demanding societal changes. Among the most pressing questions is how individuals adjust to the pandemic. This paper examines the emotional responses to the pandemic in a repeated-measures design. Data (n=1698) were collected in April 2020 (during strict lockdown measures) and in April 2021 (when vaccination pr… ▽ More

    Submitted 16 November, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: author version of accepted paper (Scientific Reports)

  16. arXiv:2103.09263  [pdf, other

    cs.CL

    No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving Text Anonymization

    Authors: Maximilian Mozes, Bennett Kleinberg

    Abstract: For sensitive text data to be shared among NLP researchers and practitioners, shared documents need to comply with data protection and privacy laws. There is hence a growing interest in automated approaches for text anonymization. However, measuring such methods' performance is challenging: missing a single identifying attribute can reveal an individual's identity. In this paper, we draw attention… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: pre-print; under review

  17. arXiv:2009.04798  [pdf

    cs.CL

    The Grievance Dictionary: Understanding Threatening Language Use

    Authors: Isabelle van der Vegt, Maximilian Mozes, Bennett Kleinberg, Paul Gill

    Abstract: This paper introduces the Grievance Dictionary, a psycholinguistic dictionary which can be used to automatically understand language use in the context of grievance-fuelled violence threat assessment. We describe the development the dictionary, which was informed by suggestions from experienced threat assessment practitioners. These suggestions and subsequent human and computational word list gene… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: pre-print

  18. arXiv:2004.05887  [pdf, other

    cs.CL

    Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

    Authors: Maximilian Mozes, Pontus Stenetorp, Bennett Kleinberg, Lewis D. Griffin

    Abstract: Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show that adversarial attacks against CNN, LSTM and Transformer-based classification models perform word substitutions that are identifiable through frequency differences between replaced words and their corresponding substitut… ▽ More

    Submitted 26 January, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: EACL 2021 camera-ready

  19. arXiv:2004.04225  [pdf, other

    cs.CL cs.IR cs.SI

    Measuring Emotions in the COVID-19 Real World Worry Dataset

    Authors: Bennett Kleinberg, Isabelle van der Vegt, Maximilian Mozes

    Abstract: The COVID-19 pandemic is having a dramatic impact on societies and economies around the world. With various measures of lockdowns and social distancing in place, it becomes important to understand emotional responses on a large scale. In this paper, we present the first ground truth dataset of emotional responses to COVID-19. We asked participants to indicate their emotions and express these in te… ▽ More

    Submitted 14 May, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Accepted to ACL 2020 COVID-19 workshop

  20. arXiv:1908.11599  [pdf

    cs.CL

    Online influence, offline violence: Language Use on YouTube surrounding the 'Unite the Right' rally

    Authors: Isabelle van der Vegt, Maximilian Mozes, Paul Gill, Bennett Kleinberg

    Abstract: The media frequently describes the 2017 Charlottesville 'Unite the Right' rally as a turning point for the alt-right and white supremacist movements. Social movement theory suggests that the media attention and public discourse concerning the rally may have influenced the alt-right, but this has yet to be empirically tested. The current study investigates whether there are differences in language… ▽ More

    Submitted 6 April, 2020; v1 submitted 30 August, 2019; originally announced August 2019.

    Comments: pre-print (pre-peer review)

  21. arXiv:1808.09722  [pdf

    cs.CL

    Identifying the sentiment styles of YouTube's vloggers

    Authors: Bennett Kleinberg, Maximilian Mozes, Isabelle van der Vegt

    Abstract: Vlogs provide a rich public source of data in a novel setting. This paper examined the continuous sentiment styles employed in 27,333 vlogs using a dynamic intra-textual approach to sentiment analysis. Using unsupervised clustering, we identified seven distinct continuous sentiment trajectories characterized by fluctuations of sentiment throughout a vlog's narrative time. We provide a taxonomy of… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: 10 pages, EMNLP 2018