Skip to main content

Showing 1–3 of 3 results for author: Young, E J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.04018  [pdf, ps, other

    cs.AI cs.CL cs.CY cs.LG

    AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents

    Authors: Akshat Naik, Patrick Quinn, Guillermo Bosch, Emma Gouné, Francisco Javier Campos Zabala, Jason Ross Brown, Edward James Young

    Abstract: As Large Language Model (LLM) agents become more widespread, associated misalignment risks increase. Prior work has examined agents' ability to enact misaligned behaviour (misalignment capability) and their compliance with harmful instructions (misuse propensity). However, the likelihood of agents attempting misaligned behaviours in real-world settings (misalignment propensity) remains poorly unde… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Prepint, under review for NeurIPS 2025

    ACM Class: I.2.7; I.2.11; K.4.1; I.2.6

  2. arXiv:2506.01926  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Large language models can learn and generalize steganographic chain-of-thought under process supervision

    Authors: Joey Skaf, Luis Ibanez-Lissen, Robert McCarthy, Connor Watts, Vasil Georgiv, Hannes Whittingham, Lorena Gonzalez-Manzano, David Lindner, Cameron Tice, Edward James Young, Puria Radmard

    Abstract: Chain-of-thought (CoT) reasoning not only enhances large language model performance but also provides critical insights into decision-making processes, marking it as a useful tool for monitoring model intent and planning. By proactively preventing models from acting on CoT indicating misaligned or harmful intent, CoT monitoring can be used to reduce risks associated with deploying models. However,… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 10 pages main text, 3 figures main text, 15 pages supplementary material, 1 figure supplementary material, submitted to NeurIPS 2025

  3. arXiv:2408.00713  [pdf, other

    cs.LG math.OC stat.ML

    Reinforcement Learning applied to Insurance Portfolio Pursuit

    Authors: Edward James Young, Alistair Rogers, Elliott Tong, James Jordon

    Abstract: When faced with a new customer, many factors contribute to an insurance firm's decision of what offer to make to that customer. In addition to the expected cost of providing the insurance, the firm must consider the other offers likely to be made to the customer, and how sensitive the customer is to differences in price. Moreover, firms often target a specific portfolio of customers that could dep… ▽ More

    Submitted 2 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 16 pages, 1 figure