Skip to main content

Showing 1–28 of 28 results for author: Bou-Ammar, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.16950  [pdf, ps, other

    cs.LG cs.AI cs.IT

    Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning

    Authors: Adnan Oomerjee, Zafeirios Fountas, Zhongwei Yu, Haitham Bou-Ammar, Jun Wang

    Abstract: Despite their impressive capabilities, Large Language Models struggle with generalisation beyond their training distribution, often exhibiting sophisticated pattern interpolation rather than true abstract reasoning (extrapolation). In this work, we approach this limitation through the lens of Information Bottleneck (IB) theory, which posits that model generalisation emerges from an optimal balance… ▽ More

    Submitted 5 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  2. arXiv:2502.01702  [pdf, other

    cs.LG

    Al-Khwarizmi: Discovering Physical Laws with Foundation Models

    Authors: Christopher E. Mower, Haitham Bou-Ammar

    Abstract: Inferring physical laws from data is a central challenge in science and engineering, including but not limited to healthcare, physical sciences, biosciences, social sciences, sustainability, climate, and robotics. Deep networks offer high-accuracy results but lack interpretability, prompting interest in models built from simple components. The Sparse Identification of Nonlinear Dynamics (SINDy) me… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  3. arXiv:2501.01544  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information

    Authors: Rasul Tutnov, Antoine Grosnit, Haitham Bou-Ammar

    Abstract: Post-alignment of large language models (LLMs) is critical in improving their utility, safety, and alignment with human intentions. Direct preference optimisation (DPO) has become one of the most widely used algorithms for achieving this alignment, given its ability to optimise models based on human feedback directly. However, the vast number of DPO variants in the literature has made it increasin… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  4. arXiv:2411.05718  [pdf, other

    cs.RO cs.AI cs.LG

    A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics

    Authors: Puze Liu, Jonas Günster, Niklas Funk, Simon Gröger, Dong Chen, Haitham Bou-Ammar, Julius Jankowski, Ante Marić, Sylvain Calinon, Andrej Orsula, Miguel Olivares-Mendez, Hongyi Zhou, Rudolf Lioutikov, Gerhard Neumann, Amarildo Likmeta Amirhossein Zhalehmehrabi, Thomas Bonenfant, Marcello Restelli, Davide Tateo, Ziyuan Liu, Jan Peters

    Abstract: Machine learning methods have a groundbreaking impact in many application domains, but their application on real robotic platforms is still limited. Despite the many challenges associated with combining machine learning technology with robotics, robot learning remains one of the most promising directions for enhancing the capabilities of robots. When deploying learning-based approaches on real rob… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: Accept at NeurIPS 2024 Dataset and Benchmark Track

  5. arXiv:2411.03562  [pdf, other

    cs.LG cs.AI

    Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

    Authors: Antoine Grosnit, Alexandre Maraval, James Doran, Giuseppe Paolo, Albert Thomas, Refinath Shahul Hameed Nabeezath Beevi, Jonas Gonzalez, Khyati Khandelwal, Ignacio Iacobacci, Abdelhakim Benechehab, Hamza Cherkaoui, Youssef Attia El-Hili, Kun Shao, Jianye Hao, Jun Yao, Balazs Kegl, Haitham Bou-Ammar, Jun Wang

    Abstract: We introduce Agent K v1.0, an end-to-end autonomous data science agent designed to automate, optimise, and generalise across diverse data science tasks. Fully automated, Agent K v1.0 manages the entire data science life cycle by learning from experience. It leverages a highly flexible structured reasoning framework to enable it to dynamically process memory in a nested structure, effectively learn… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  6. arXiv:2410.05102  [pdf, other

    cs.CL cs.AI cs.LG

    SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks

    Authors: Fenia Christopoulou, Ronald Cardenas, Gerasimos Lampouras, Haitham Bou-Ammar, Jun Wang

    Abstract: Preference Optimization (PO) has proven an effective step for aligning language models to human-desired behaviors. Current variants, following the offline Direct Preference Optimization objective, have focused on a strict setting where all tokens are contributing signals of KL divergence and rewards to the loss function. However, human preference is not affected by each word in a sequence equally… ▽ More

    Submitted 8 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: 20 pages, 9 figures, 5 tables. Under Review

  7. arXiv:2408.09858  [pdf, ps, other

    cs.LG cs.AR

    ShortCircuit: AlphaZero-Driven Circuit Design

    Authors: Dimitrios Tsaras, Antoine Grosnit, Lei Chen, Zhiyao Xie, Haitham Bou-Ammar, Mingxuan Yuan

    Abstract: Chip design relies heavily on generating Boolean circuits, such as AND-Inverter Graphs (AIGs), from functional descriptions like truth tables. This generation operation is a key process in logic synthesis, a primary chip design stage. While recent advances in deep learning have aimed to accelerate circuit design, these efforts have mostly focused on tasks other than synthesis, and traditional heur… ▽ More

    Submitted 2 October, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  8. arXiv:2407.09450  [pdf, other

    cs.AI cs.CL cs.LG q-bio.NC

    Human-like Episodic Memory for Infinite Context LLMs

    Authors: Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee, Fenia Christopoulou, Gerasimos Lampouras, Haitham Bou-Ammar, Jun Wang

    Abstract: Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrat… ▽ More

    Submitted 25 October, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  9. arXiv:2406.19741  [pdf, other

    cs.RO cs.AI

    ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

    Authors: Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

    Abstract: We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect… ▽ More

    Submitted 12 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This document contains 26 pages and 13 figures

  10. arXiv:2404.09080  [pdf, other

    cs.RO cs.LG

    Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications

    Authors: Puze Liu, Haitham Bou-Ammar, Jan Peters, Davide Tateo

    Abstract: Integrating learning-based techniques, especially reinforcement learning, into robotics is promising for solving complex problems in unstructured environments. However, most existing approaches are trained in well-tuned simulators and subsequently deployed on real robots without online fine-tuning. In this setting, extensive engineering is required to mitigate the sim-to-real gap, which can be cha… ▽ More

    Submitted 6 November, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: 19 pages; sumitted to IEEE Transactions on Robotics

  11. arXiv:2403.01928  [pdf, other

    cs.RO

    ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization

    Authors: Yao Zhao, Tao Wu, Yijie Zhu, Xiang Lu, Jun Wang, Haitham Bou-Ammar, Xinyu Zhang, Peng Du

    Abstract: We present ZSL-RPPO, an improved zero-shot learning architecture that overcomes the limitations of teacher-student neural networks and enables generating robust, reliable, and versatile locomotion for quadrupedal robots in challenging terrains. We propose a new algorithm RPPO (Recurrent Proximal Policy Optimization) that directly trains recurrent neural network in partially observable environments… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  12. arXiv:2402.13210  [pdf, other

    cs.LG

    Bayesian Reward Models for LLM Alignment

    Authors: Adam X. Yang, Maxime Robeyns, Thomas Coste, Zhengyan Shi, Jun Wang, Haitham Bou-Ammar, Laurence Aitchison

    Abstract: To ensure that large language model (LLM) responses are helpful and non-toxic, a reward model trained on human preference data is usually used. LLM responses with high rewards are then selected through best-of-$n$ (BoN) sampling or the LLM is further optimized to produce responses with high rewards through reinforcement learning from human feedback (RLHF). However, these processes are susceptible… ▽ More

    Submitted 2 July, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  13. arXiv:2312.14878  [pdf, other

    cs.AI cs.LG

    Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

    Authors: Filippos Christianos, Georgios Papoudakis, Matthieu Zimmer, Thomas Coste, Zhihao Wu, Jingxuan Chen, Khyati Khandelwal, James Doran, Xidong Feng, Jiacheng Liu, Zheng Xiong, Yicheng Luo, Jianye Hao, Kun Shao, Haitham Bou-Ammar, Jun Wang

    Abstract: A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: paper and appendix, 27 pages

  14. arXiv:2310.13571  [pdf, ps, other

    cs.CL

    Why Can Large Language Models Generate Correct Chain-of-Thoughts?

    Authors: Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar

    Abstract: This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we… ▽ More

    Submitted 6 June, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  15. arXiv:2301.12844  [pdf, other

    cs.LG stat.ML

    Are Random Decompositions all we need in High Dimensional Bayesian Optimisation?

    Authors: Juliusz Ziomek, Haitham Bou-Ammar

    Abstract: Learning decompositions of expensive-to-evaluate black-box functions promises to scale Bayesian optimisation (BO) to high-dimensional problems. However, the success of these techniques depends on finding proper decompositions that accurately represent the black-box. While previous works learn those decompositions based on data, we investigate data-independent decomposition sampling rules in this p… ▽ More

    Submitted 29 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  16. arXiv:2301.12412  [pdf, other

    cs.LG

    Contextual Causal Bayesian Optimisation

    Authors: Vahan Arsenyan, Antoine Grosnit, Haitham Bou-Ammar

    Abstract: Causal Bayesian optimisation (CaBO) combines causality with Bayesian optimisation (BO) and shows that there are situations where the optimal reward is not achievable if causal knowledge is ignored. While CaBO exploits causal relations to determine the set of controllable variables to intervene on, it does not exploit purely observational variables and marginalises them. We show that, in general, u… ▽ More

    Submitted 29 January, 2025; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: 8 pages (not counting references and appendix), 4 figures, 3 graphs

  17. arXiv:2301.04330  [pdf, other

    cs.RO cs.AI

    Fast Kinodynamic Planning on the Constraint Manifold with Deep Neural Networks

    Authors: Piotr Kicki, Puze Liu, Davide Tateo, Haitham Bou-Ammar, Krzysztof Walas, Piotr Skrzypczyński, Jan Peters

    Abstract: Motion planning is a mature area of research in robotics with many well-established methods based on optimization or sampling the state space, suitable for solving kinematic motion planning. However, when dynamic motions under constraints are needed and computation time is limited, fast kinodynamic planning on the constraint manifold is indispensable. In recent years, learning-based solutions have… ▽ More

    Submitted 12 January, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    ACM Class: I.2.9; I.2.6

  18. arXiv:2202.06558  [pdf, other

    cs.LG cs.AI

    Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

    Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang, David Mguni, Jun Wang, Haitham Bou-Ammar

    Abstract: Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting… ▽ More

    Submitted 22 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  19. arXiv:2202.06557  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning in Presence of Discrete Markovian Context Evolution

    Authors: Hang Ren, Aivar Sootla, Taher Jafferjee, Junxiao Shen, Jun Wang, Haitham Bou-Ammar

    Abstract: We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inferenc… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted to ICLR 2022

  20. arXiv:2202.01388  [pdf, other

    quant-ph cs.LG

    Self-consistent Gradient-like Eigen Decomposition in Solving Schrödinger Equations

    Authors: Xihan Li, Xiang Chen, Rasul Tutunov, Haitham Bou-Ammar, Lei Wang, Jun Wang

    Abstract: The Schrödinger equation is at the heart of modern quantum mechanics. Since exact solutions of the ground state are typically intractable, standard approaches approximate Schrödinger equation as forms of nonlinear generalized eigenvalue problems $F(V)V = SVΛ$ in which $F(V)$, the matrix to be decomposed, is a function of its own top-$k$ smallest eigenvectors $V$, leading to a "self-consistency pro… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  21. arXiv:2201.12570  [pdf, other

    q-bio.BM cs.AI cs.LG cs.NE stat.ML

    AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation

    Authors: Asif Khan, Alexander I. Cowen-Rivers, Antoine Grosnit, Derrick-Goh-Xin Deik, Philippe A. Robert, Victor Greiff, Eva Smorodina, Puneet Rawat, Kamil Dreczkowski, Rahmad Akbar, Rasul Tutunov, Dany Bou-Ammar, Jun Wang, Amos Storkey, Haitham Bou-Ammar

    Abstract: Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies. However, the combinatorial nature of CDRH3 sequence space makes it imposs… ▽ More

    Submitted 14 October, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

  22. arXiv:2107.06140  [pdf, other

    cs.RO

    Efficient and Reactive Planning for High Speed Robot Air Hockey

    Authors: Puze Liu, Davide Tateo, Haitham Bou-Ammar, Jan Peters

    Abstract: Highly dynamic robotic tasks require high-speed and reactive robots. These tasks are particularly challenging due to the physical constraints, hardware limitations, and the high uncertainty of dynamics and sensor measures. To face these issues, it's crucial to design robotics agents that generate precise and fast trajectories and react immediately to environmental changes. Air hockey is an example… ▽ More

    Submitted 14 July, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: 2021 IEEE/RJS International Conference on Intelligent RObots and Systems (IROS)

  23. arXiv:2106.03609  [pdf, other

    cs.LG

    High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

    Authors: Antoine Grosnit, Rasul Tutunov, Alexandre Max Maraval, Ryan-Rhys Griffiths, Alexander I. Cowen-Rivers, Lin Yang, Lin Zhu, Wenlong Lyu, Zhitang Chen, Jun Wang, Jan Peters, Haitham Bou-Ammar

    Abstract: We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces. By adapting ideas from deep metric learning, we use label guidance from the blackbox function to structure the VAE latent space, facilitating the Gaussian process fit and yielding improved BO performance. Importantly for B… ▽ More

    Submitted 1 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

  24. arXiv:2012.08240  [pdf, other

    cs.LG stat.ML

    Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?

    Authors: Antoine Grosnit, Alexander I. Cowen-Rivers, Rasul Tutunov, Ryan-Rhys Griffiths, Jun Wang, Haitham Bou-Ammar

    Abstract: Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximi… ▽ More

    Submitted 17 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

  25. arXiv:2002.03755  [pdf, other

    cs.LG math.OC stat.ML

    Compositional ADAM: An Adaptive Compositional Solver

    Authors: Rasul Tutunov, Minne Li, Alexander I. Cowen-Rivers, Jun Wang, Haitham Bou-Ammar

    Abstract: In this paper, we present C-ADAM, the first adaptive solver for compositional problems involving a non-linear functional nesting of expected values. We proof that C-ADAM converges to a stationary point in $\mathcal{O}(δ^{-2.25})$ with $δ$ being a precision parameter. Moreover, we demonstrate the importance of our results by bridging, for the first time, model-agnostic meta-learning (MAML) and comp… ▽ More

    Submitted 24 April, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  26. arXiv:1802.06604  [pdf, other

    cs.AI

    Learning High-level Representations from Demonstrations

    Authors: Garrett Andersen, Peter Vrancx, Haitham Bou-Ammar

    Abstract: Hierarchical learning (HL) is key to solving complex sequential decision problems with long horizons and sparse rewards. It allows learning agents to break-up large problems into smaller, more manageable subtasks. A common approach to HL, is to provide the agent with a number of high-level skills that solve small parts of the overall problem. A major open question, however, is how to identify a su… ▽ More

    Submitted 28 February, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

  27. arXiv:1802.03216  [pdf, other

    cs.AI

    Balancing Two-Player Stochastic Games with Soft Q-Learning

    Authors: Jordi Grau-Moya, Felix Leibfried, Haitham Bou-Ammar

    Abstract: Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning to… ▽ More

    Submitted 8 January, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

  28. arXiv:1708.01867  [pdf, other

    cs.AI cs.LG stat.ML

    An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

    Authors: Felix Leibfried, Jordi Grau-Moya, Haitham Bou-Ammar

    Abstract: We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Differen… ▽ More

    Submitted 20 November, 2018; v1 submitted 6 August, 2017; originally announced August 2017.

    Comments: Presented at the NIPS Deep Reinforcement Learning Workshop, Montreal, Canada, 2018