Skip to main content

Showing 1–3 of 3 results for author: Yi-Chern, T

.
  1. arXiv:2505.15795  [pdf, ps, other

    cs.CL

    Reverse Engineering Human Preferences with Reinforcement Learning

    Authors: Lisa Alazraki, Tan Yi-Chern, Jon Ander Campos, Maximilian Mozes, Marek Rei, Max Bartolo

    Abstract: The capabilities of Large Language Models (LLMs) are routinely evaluated by other LLMs trained to predict human preferences. This framework--known as LLM-as-a-judge--is highly scalable and relatively low cost. However, it is also vulnerable to malicious exploitation, as LLM responses can be tuned to overfit the preferences of the judge. Previous work shows that the answers generated by a candidate… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  2. arXiv:2504.00698  [pdf

    cs.CL cs.AI cs.LG

    Command A: An Enterprise-Ready Large Language Model

    Authors: Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Milad Alizadeh, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Raphaël Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Björn Bebensee, Neeral Beladia, Walter Beller-Morales, Alexandre Bérard, Andrew Berneshawi, Anna Bialas, Phil Blunsom , et al. (205 additional authors not shown)

    Abstract: In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera… ▽ More

    Submitted 14 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 55 pages

  3. arXiv:2502.08550  [pdf, other

    cs.CL cs.AI

    No Need for Explanations: LLMs can implicitly learn from mistakes in-context

    Authors: Lisa Alazraki, Maximilian Mozes, Jon Ander Campos, Tan Yi-Chern, Marek Rei, Max Bartolo

    Abstract: Showing incorrect answers to Large Language Models (LLMs) is a popular strategy to improve their performance in reasoning-intensive tasks. It is widely assumed that, in order to be helpful, the incorrect answers must be accompanied by comprehensive rationales, explicitly detailing where the mistakes are and how to correct them. However, in this work we present a counterintuitive finding: we observ… ▽ More

    Submitted 21 May, 2025; v1 submitted 12 February, 2025; originally announced February 2025.