Skip to main content

Showing 1–16 of 16 results for author: Chan, A J

.
  1. arXiv:2502.18356  [pdf, other

    cs.LG

    WebGames: Challenging General-Purpose Web-Browsing AI Agents

    Authors: George Thomas, Alex J. Chan, Jikun Kang, Wenqi Wu, Filippos Christianos, Fraser Greenlee, Andy Toulis, Marvin Purtorab

    Abstract: We introduce WebGames, a comprehensive benchmark suite designed to evaluate general-purpose web-browsing AI agents through a collection of 50+ interactive challenges. These challenges are specifically crafted to be straightforward for humans while systematically testing the limitations of current AI systems across fundamental browser interactions, advanced input processing, cognitive tasks, workfl… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  2. arXiv:2502.06049  [pdf, other

    cs.CL cs.AI

    LM2: Large Memory Models

    Authors: Jikun Kang, Wenqi Wu, Filippos Christianos, Alex J. Chan, Fraser Greenlee, George Thomas, Marvin Purtorab, Andy Toulis

    Abstract: This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation reposi… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  3. arXiv:2406.08414  [pdf, other

    cs.LG

    Discovering Preference Optimization Algorithms with and for Large Language Models

    Authors: Chris Lu, Samuel Holt, Claudio Fanconi, Alex J. Chan, Jakob Foerster, Mihaela van der Schaar, Robert Tjarko Lange

    Abstract: Offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. Typically, preference optimization is approached as an offline supervised learning task using manually-crafted convex loss functions. While these methods are based on theoretical insights, they are inherently constrained by human creativity, so the large search space of… ▽ More

    Submitted 2 November, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2402.00782  [pdf, other

    cs.LG

    Dense Reward for Free in Reinforcement Learning from Human Feedback

    Authors: Alex J. Chan, Hao Sun, Samuel Holt, Mihaela van der Schaar

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been credited as the key advance that has allowed Large Language Models (LLMs) to effectively follow instructions and produce useful assistance. Classically, this involves generating completions from the LLM in response to a query before using a separate reward model to assign a score to the full completion. As an auto-regressive process, the L… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  5. arXiv:2312.02401  [pdf, other

    stat.ML cs.LG cs.SI

    Enhancing Content Moderation with Culturally-Aware Models

    Authors: Alex J. Chan, José Luis Redondo García, Fabrizio Silvestri, Colm O'Donnell, Konstantina Palla

    Abstract: Content moderation on a global scale must navigate a complex array of local cultural distinctions, which can hinder effective enforcement. While global policies aim for consistency and broad applicability, they often miss the subtleties of regional language interpretation, cultural beliefs, and local legislation. This work introduces a flexible framework that enhances foundation language models wi… ▽ More

    Submitted 5 November, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 7 pages, 7 Figures. Supplementary material

  6. arXiv:2311.14110  [pdf, other

    cs.LG cs.AI

    When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective

    Authors: Hao Sun, Alex J. Chan, Nabeel Seedat, Alihan Hüyük, Mihaela van der Schaar

    Abstract: Evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. On the one hand, it brings opportunities for safe policy improvement under high-stakes scenarios like clinical guidelines. On the other hand, such opportunities raise a need for precise off-policy evaluation (OPE). While previous work on OPE focused on improving the algorithm in value esti… ▽ More

    Submitted 28 October, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: Reward Modeling, Large Language Models, RLHF, Off-Policy Evaluation, Data-Centric AI, Data-Centric Reinforcement Learning, Reinforcement Learning

  7. arXiv:2311.07426  [pdf, other

    cs.LG cs.CV cs.HC

    Optimising Human-AI Collaboration by Learning Convincing Explanations

    Authors: Alex J. Chan, Alihan Huyuk, Mihaela van der Schaar

    Abstract: Machine learning models are being increasingly deployed to take, or assist in taking, complicated and high-impact decisions, from quasi-autonomous vehicles to clinical decision support systems. This poses challenges, particularly when models have hard-to-detect failure modes and are able to take actions without oversight. In order to handle this challenge, we propose a method for a collaborative s… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  8. arXiv:2309.15840  [pdf, other

    cs.CL cs.AI cs.LG

    How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

    Authors: Lorenzo Pacchiardi, Alex J. Chan, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner

    Abstract: Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation. Here, we develop a simple lie detector that requires neither access to the LLM's activations (black-box) nor ground-truth knowledge of the fact in question. The detector works by asking a… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  9. arXiv:2211.06138  [pdf, other

    cs.LG cs.CY stat.ML

    Practical Approaches for Fair Learning with Multitype and Multivariate Sensitive Attributes

    Authors: Tennison Liu, Alex J. Chan, Boris van Breugel, Mihaela van der Schaar

    Abstract: It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences. Fair ML has largely focused on the protection of single attributes in the simpler setting where both attributes and target outcomes are binary. However, the practical application in many a real-world problem entails the simultaneous protection of m… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  10. arXiv:2210.05320  [pdf, other

    cs.LG

    Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning

    Authors: Alex J. Chan, Mihaela van der Schaar

    Abstract: Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data - instead given access to a set of expert models and their predictions alongside some limited information about the dataset used to train them. In scenarios from finance to the medical sciences, and even consumer practice, stakeholders have developed models on private data they eit… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  11. arXiv:2203.08057  [pdf, other

    cs.LG

    POETREE: Interpretable Policy Learning with Adaptive Decision Trees

    Authors: Alizée Pace, Alex J. Chan, Mihaela van der Schaar

    Abstract: Building models of human decision-making from observed behaviour is critical to better understand, diagnose and support real-world policies such as clinical care. As established policy learning approaches remain focused on imitation performance, they fall short of explaining the demonstrated decision-making process. Policy Extraction through decision Trees (POETREE) is a novel framework for interp… ▽ More

    Submitted 30 September, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

  12. arXiv:2203.07338  [pdf, other

    cs.LG

    Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

    Authors: Alex J. Chan, Alicia Curth, Mihaela van der Schaar

    Abstract: Human decision making is well known to be imperfect and the ability to analyse such processes individually is crucial when attempting to aid or improve a decision-maker's ability to perform a task, e.g. to alert them to potential biases or oversights on their part. To do so, it is necessary to develop interpretable representations of how agents make decisions and how this process changes over time… ▽ More

    Submitted 30 September, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

  13. arXiv:2106.04240  [pdf, other

    cs.LG

    The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation

    Authors: Alex J. Chan, Ioana Bica, Alihan Huyuk, Daniel Jarrett, Mihaela van der Schaar

    Abstract: Understanding decision-making in clinical environments is of paramount importance if we are to bring the strengths of machine learning to ultimately improve patient outcomes. Several factors including the availability of public data, the intrinsically offline nature of the problem, and the complexity of human decision making, has meant that the mainstream development of algorithms is often geared… ▽ More

    Submitted 14 March, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

  14. arXiv:2102.06483  [pdf, other

    cs.LG

    Scalable Bayesian Inverse Reinforcement Learning

    Authors: Alex J. Chan, Mihaela van der Schaar

    Abstract: Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, b… ▽ More

    Submitted 11 March, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  15. arXiv:2006.14988  [pdf, other

    stat.ML cs.LG

    Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift

    Authors: Alex J. Chan, Ahmed M. Alaa, Zhaozhi Qian, Mihaela van der Schaar

    Abstract: Modern neural networks have proven to be powerful function approximators, providing state-of-the-art performance in a multitude of applications. They however fall short in their ability to quantify confidence in their predictions - this is crucial in high-stakes applications that involve critical decision-making. Bayesian neural networks (BNNs) aim at solving this problem by placing a prior distri… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  16. arXiv:1303.0729  [pdf, ps, other

    math.AC

    Groebner bases over fields with valuations

    Authors: Andrew J. Chan, Diane Maclagan

    Abstract: Let K be a field with a valuation and let S be the polynomial ring S:= K[x_1,..., x_n]. We discuss the extension of Groebner theory to ideals in S, taking the valuations of coefficients into account, and describe the Buchberger algorithm in this context. In addition we discuss some implementation and complexity issues. The main motivation comes from tropical geometry, as tropical varieties can be… ▽ More

    Submitted 1 September, 2017; v1 submitted 4 March, 2013; originally announced March 2013.

    Comments: Final version to appear in Mathematics of Computation

    MSC Class: 13P10; 14T05