-
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving
Authors:
Matthieu Zimmer,
Xiaotong Ji,
Rasul Tutunov,
Anthony Bordg,
Jun Wang,
Haitham Bou Ammar
Abstract:
Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-gener…
▽ More
Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-generated goal-conditioned MDPs (sG-MDPs), a new framework in which agents generate and pursue their subgoals based on the evolving proof state. Given this more structured generation of goals, the resulting problem becomes more amenable to search. We then apply Monte Carlo Tree Search (MCTS)-like algorithms to solve the sG-MDP, instantiating our approach in Bourbaki (7B), a modular system that can ensemble multiple 7B LLMs for subgoal generation and tactic synthesis. On PutnamBench, Bourbaki (7B) solves 26 problems, achieving new state-of-the-art results with models at this scale.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems
Authors:
Zhongzhi Yu,
Mingjie Liu,
Michael Zimmer,
Yingyan Celine Lin,
Yong Liu,
Haoxing Ren
Abstract:
Despite recent progress in generating hardware RTL code with LLMs, existing solutions still suffer from a substantial gap between practical application scenarios and the requirements of real-world RTL code development. Prior approaches either focus on overly simplified hardware descriptions or depend on extensive human guidance to process complex specifications, limiting their scalability and auto…
▽ More
Despite recent progress in generating hardware RTL code with LLMs, existing solutions still suffer from a substantial gap between practical application scenarios and the requirements of real-world RTL code development. Prior approaches either focus on overly simplified hardware descriptions or depend on extensive human guidance to process complex specifications, limiting their scalability and automation potential. In this paper, we address this gap by proposing an LLM agent system, termed Spec2RTL-Agent, designed to directly process complex specification documentation and generate corresponding RTL code implementations, advancing LLM-based RTL code generation toward more realistic application settings. To achieve this goal, Spec2RTL-Agent introduces a novel multi-agent collaboration framework that integrates three key enablers: (1) a reasoning and understanding module that translates specifications into structured, step-by-step implementation plans; (2) a progressive coding and prompt optimization module that iteratively refines the code across multiple representations to enhance correctness and synthesisability for RTL conversion; and (3) an adaptive reflection module that identifies and traces the source of errors during generation, ensuring a more robust code generation flow. Instead of directly generating RTL from natural language, our system strategically generates synthesizable C++ code, which is then optimized for HLS. This agent-driven refinement ensures greater correctness and compatibility compared to naive direct RTL generation approaches. We evaluate Spec2RTL-Agent on three specification documents, showing it generates accurate RTL code with up to 75% fewer human interventions than existing methods. This highlights its role as the first fully automated multi-agent system for RTL generation from unstructured specs, reducing reliance on human effort in hardware design.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms
Authors:
Hiroshi Kera,
Nico Pelleriti,
Yuki Ishihara,
Max Zimmer,
Sebastian Pokutta
Abstract:
Solving systems of polynomial equations, particularly those with finitely many solutions, is a crucial challenge across many scientific fields. Traditional methods like Gröbner and Border bases are fundamental but suffer from high computational costs, which have motivated recent Deep Learning approaches to improve efficiency, albeit at the expense of output correctness. In this work, we introduce…
▽ More
Solving systems of polynomial equations, particularly those with finitely many solutions, is a crucial challenge across many scientific fields. Traditional methods like Gröbner and Border bases are fundamental but suffer from high computational costs, which have motivated recent Deep Learning approaches to improve efficiency, albeit at the expense of output correctness. In this work, we introduce the Oracle Border Basis Algorithm, the first Deep Learning approach that accelerates Border basis computation while maintaining output guarantees. To this end, we design and train a Transformer-based oracle that identifies and eliminates computationally expensive reduction steps, which we find to dominate the algorithm's runtime. By selectively invoking this oracle during critical phases of computation, we achieve substantial speedup factors of up to 3.5x compared to the base algorithm, without compromising the correctness of results. To generate the training data, we develop a sampling method and provide the first sampling theorem for border bases. We construct a tokenization and embedding scheme tailored to monomial-centered algebraic computations, resulting in a compact and expressive input representation, which reduces the number of tokens to encode an $n$-variate polynomial by a factor of $O(n)$. Our learning approach is data efficient, stable, and a practical enhancement to traditional computer algebra algorithms and symbolic computation.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
Authors:
Alonso Urbano,
David W. Romero,
Max Zimmer,
Sebastian Pokutta
Abstract:
Real-world data often exhibits unknown or approximate symmetries, yet existing equivariant networks must commit to a fixed transformation group prior to training, e.g., continuous $SO(2)$ rotations. This mismatch degrades performance when the actual data symmetries differ from those in the transformation group. We introduce RECON, a framework to discover each input's intrinsic symmetry distributio…
▽ More
Real-world data often exhibits unknown or approximate symmetries, yet existing equivariant networks must commit to a fixed transformation group prior to training, e.g., continuous $SO(2)$ rotations. This mismatch degrades performance when the actual data symmetries differ from those in the transformation group. We introduce RECON, a framework to discover each input's intrinsic symmetry distribution from unlabeled data. RECON leverages class-pose decompositions and applies a data-driven normalization to align arbitrary reference frames into a common natural pose, yielding directly comparable and interpretable symmetry descriptors. We demonstrate effective symmetry discovery on 2D image benchmarks and -- for the first time -- extend it to 3D transformation groups, paving the way towards more flexible equivariant modeling.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Authors:
Ibrahim Fayad,
Max Zimmer,
Martin Schwartz,
Philippe Ciais,
Fabian Gieseke,
Gabriel Belouze,
Sarah Brood,
Aurelien De Truchis,
Alexandre d'Aspremont
Abstract:
Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, existing methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images an…
▽ More
Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, existing methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images and full-waveform LiDAR data. As the model is trained in a contrastive manner, the embeddings can be directly leveraged in the context of a variety of environmental monitoring tasks in a zero-shot setting. In our experiments, we demonstrate the effectiveness of the embeddings for seven such tasks (canopy height mapping, fractional canopy cover, land cover mapping, tree species identification, plant area index, crop type classification, and per-pixel waveform-based vertical structure mapping). The results show that the embeddings, along with zero-shot classifiers, often outperform specialized supervised models, even in low data regimes. In the fine-tuning setting, we show strong low-shot capabilities with performances near or better than state-of-the-art on five out of six tasks.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Approximating Latent Manifolds in Neural Networks via Vanishing Ideals
Authors:
Nico Pelleriti,
Max Zimmer,
Elias Wirth,
Sebastian Pokutta
Abstract:
Deep neural networks have reshaped modern machine learning by learning powerful latent representations that often align with the manifold hypothesis: high-dimensional data lie on lower-dimensional manifolds. In this paper, we establish a connection between manifold learning and computational algebra by demonstrating how vanishing ideals can characterize the latent manifolds of deep networks. To th…
▽ More
Deep neural networks have reshaped modern machine learning by learning powerful latent representations that often align with the manifold hypothesis: high-dimensional data lie on lower-dimensional manifolds. In this paper, we establish a connection between manifold learning and computational algebra by demonstrating how vanishing ideals can characterize the latent manifolds of deep networks. To that end, we propose a new neural architecture that (i) truncates a pretrained network at an intermediate layer, (ii) approximates each class manifold via polynomial generators of the vanishing ideal, and (iii) transforms the resulting latent space into linearly separable features through a single polynomial layer. The resulting models have significantly fewer layers than their pretrained baselines, while maintaining comparable accuracy, achieving higher throughput, and utilizing fewer parameters. Furthermore, drawing on spectral complexity analysis, we derive sharper theoretical guarantees for generalization, showing that our approach can in principle offer tighter bounds than standard deep networks. Numerical experiments confirm the effectiveness and efficiency of the proposed approach.
△ Less
Submitted 6 June, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
On Almost Surely Safe Alignment of Large Language Models at Inference-Time
Authors:
Xiaotong Ji,
Shyam Sundhar Ramesh,
Matthieu Zimmer,
Ilija Bogunovic,
Jun Wang,
Haitham Bou Ammar
Abstract:
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely, i.e., with probability approaching one. Our approach models the generation of safe responses as a constrained Markov Decision Process (MDP) within the LLM's latent space. We augment a safety state that tracks the evolution of safety constraints and dynamically penalize unsafe generat…
▽ More
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely, i.e., with probability approaching one. Our approach models the generation of safe responses as a constrained Markov Decision Process (MDP) within the LLM's latent space. We augment a safety state that tracks the evolution of safety constraints and dynamically penalize unsafe generations to ensure the generation of safe responses. Consequently, we demonstrate formal safety guarantees w.r.t. the given cost model upon solving the MDP in the latent space with sufficiently large penalties. Building on this foundation, we propose InferenceGuard, a practical implementation that safely aligns LLMs without modifying the model weights. Empirically, we demonstrate that InferenceGuard effectively balances safety and task performance, outperforming existing inference-time alignment methods in generating safe and aligned responses. Our findings contribute to the advancement of safer LLM deployment through alignment at inference-time, thus presenting a promising alternative to resource-intensive, overfitting-prone alignment techniques like RLHF.
△ Less
Submitted 20 June, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation
Authors:
Jan Pauls,
Max Zimmer,
Berkant Turan,
Sassan Saatchi,
Philippe Ciais,
Sebastian Pokutta,
Fabian Gieseke
Abstract:
With the rise in global greenhouse gas emissions, accurate large-scale tree canopy height maps are essential for understanding forest structure, estimating above-ground biomass, and monitoring ecological disruptions. To this end, we present a novel approach to generate large-scale, high-resolution canopy height maps over time. Our model accurately predicts canopy height over multiple years given S…
▽ More
With the rise in global greenhouse gas emissions, accurate large-scale tree canopy height maps are essential for understanding forest structure, estimating above-ground biomass, and monitoring ecological disruptions. To this end, we present a novel approach to generate large-scale, high-resolution canopy height maps over time. Our model accurately predicts canopy height over multiple years given Sentinel-1 composite and Sentinel~2 time series satellite data. Using GEDI LiDAR data as the ground truth for training the model, we present the first 10m resolution temporal canopy height map of the European continent for the period 2019-2022. As part of this product, we also offer a detailed canopy height map for 2020, providing more precise estimates than previous studies. Our pipeline and the resulting temporal height map are publicly available, enabling comprehensive large-scale monitoring of forests and, hence, facilitating future research and ecological analyses.
△ Less
Submitted 12 June, 2025; v1 submitted 31 January, 2025;
originally announced January 2025.
-
Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?
Authors:
Konrad Mundinger,
Max Zimmer,
Aldo Kiem,
Christoph Spiegel,
Sebastian Pokutta
Abstract:
We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem at the intersection of discrete geometry and extremal combinatorics that is concerned with coloring the plane while avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geom…
▽ More
We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem at the intersection of discrete geometry and extremal combinatorics that is concerned with coloring the plane while avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geometric coloring problem with hard constraints as an optimization task with a probabilistic, differentiable loss function. This enables gradient-based exploration of admissible configurations that most significantly led to the discovery of two novel six-colorings, providing the first improvement in thirty years to the off-diagonal variant of the original problem. Here, we establish the underlying machine learning approach used to obtain these results and demonstrate its broader applicability through additional numerical insights.
△ Less
Submitted 5 June, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations
Authors:
Megan A. Brown,
Andrew Gruen,
Gabe Maldoff,
Solomon Messing,
Zeve Sanderson,
Michael Zimmer
Abstract:
Scientists across disciplines often use data from the internet to conduct research, generating valuable insights about human behavior. However, as generative AI relying on massive text corpora becomes increasingly valuable, platforms have greatly restricted access to data through official channels. As a result, researchers will likely engage in more web scraping to collect data, introducing new ch…
▽ More
Scientists across disciplines often use data from the internet to conduct research, generating valuable insights about human behavior. However, as generative AI relying on massive text corpora becomes increasingly valuable, platforms have greatly restricted access to data through official channels. As a result, researchers will likely engage in more web scraping to collect data, introducing new challenges and concerns for researchers. This paper proposes a comprehensive framework for web scraping in social science research for U.S.-based researchers, examining the legal, ethical, institutional, and scientific factors that researchers should consider when scraping the web. We present an overview of the current regulatory environment impacting when and how researchers can access, collect, store, and share data via scraping. We then provide researchers with recommendations to conduct scraping in a scientifically legitimate and ethical manner. We aim to equip researchers with the relevant information to mitigate risks and maximize the impact of their research amidst this evolving data access landscape.
△ Less
Submitted 19 December, 2024; v1 submitted 30 October, 2024;
originally announced October 2024.
-
Mixture of Attentions For Speculative Decoding
Authors:
Matthieu Zimmer,
Milan Gritta,
Gerasimos Lampouras,
Haitham Bou Ammar,
Jun Wang
Abstract:
The growth in the number of parameters of Large Language Models (LLMs) has led to a significant surge in computational requirements, making them challenging and costly to deploy. Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the LLM in parallel. Small models that utilise activations from the LLM currently achieve the fastest dec…
▽ More
The growth in the number of parameters of Large Language Models (LLMs) has led to a significant surge in computational requirements, making them challenging and costly to deploy. Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the LLM in parallel. Small models that utilise activations from the LLM currently achieve the fastest decoding speeds. However, we identify several limitations of SD models including the lack of on-policyness during training and partial observability. To address these shortcomings, we propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD. Our novel architecture can be applied in two scenarios: a conventional single device deployment and a novel client-server deployment where the small model is hosted on a consumer device and the LLM on a server. In a single-device scenario, we demonstrate state-of-the-art speedups improving EAGLE-2 by 9.5% and its acceptance length by 25%. In a client-server setting, our experiments demonstrate: 1) state-of-the-art latencies with minimal calls to the server for different network conditions, and 2) in the event of a complete disconnection, our approach can maintain higher accuracy compared to other SD methods and demonstrates advantages over API calls to LLMs, which would otherwise be unable to continue the generation process.
△ Less
Submitted 3 April, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Authors:
Christopher E. Mower,
Yuhui Wan,
Hongzhan Yu,
Antoine Grosnit,
Jonas Gonzalez-Billandon,
Matthieu Zimmer,
Jinlong Wang,
Xinyu Zhang,
Yao Zhao,
Anbang Zhai,
Puze Liu,
Daniel Palenicek,
Davide Tateo,
Cesar Cadena,
Marco Hutter,
Jan Peters,
Guangjian Tian,
Yuzheng Zhuang,
Kun Shao,
Xingyue Quan,
Jianye Hao,
Jun Wang,
Haitham Bou-Ammar
Abstract:
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect…
▽ More
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.
△ Less
Submitted 12 July, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
Estimating Canopy Height at Scale
Authors:
Jan Pauls,
Max Zimmer,
Una M. Kelly,
Martin Schwartz,
Sassan Saatchi,
Philippe Ciais,
Sebastian Pokutta,
Martin Brandt,
Fabian Gieseke
Abstract:
We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regio…
▽ More
We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Comment on "Machine learning conservation laws from differential equations"
Authors:
Michael F. Zimmer
Abstract:
The paper [1] by Liu, Madhavan, and Tegmark sought to use machine learning methods to elicit known conservation laws for several systems. However, in their example of a damped 1D harmonic oscillator they made seven serious errors, causing both their method and result to be incorrect. In this Comment, those errors are reviewed.
The paper [1] by Liu, Madhavan, and Tegmark sought to use machine learning methods to elicit known conservation laws for several systems. However, in their example of a damped 1D harmonic oscillator they made seven serious errors, causing both their method and result to be incorrect. In this Comment, those errors are reviewed.
△ Less
Submitted 31 January, 2025; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Constants of Motion for Conserved and Non-conserved Dynamics
Authors:
Michael F. Zimmer
Abstract:
This paper begins with a dynamical model that was obtained by applying a machine learning technique (FJet) to time-series data; this dynamical model is then analyzed with Lie symmetry techniques to obtain constants of motion. This analysis is performed on both the conserved and non-conserved cases of the 1D and 2D harmonic oscillators. For the 1D oscillator, constants are found in the cases where…
▽ More
This paper begins with a dynamical model that was obtained by applying a machine learning technique (FJet) to time-series data; this dynamical model is then analyzed with Lie symmetry techniques to obtain constants of motion. This analysis is performed on both the conserved and non-conserved cases of the 1D and 2D harmonic oscillators. For the 1D oscillator, constants are found in the cases where the system is underdamped, overdamped, and critically damped. The novel existence of such a constant for a non-conserved model is interpreted as a manifestation of the conservation of energy of the {\em total} system (i.e., oscillator plus dissipative environment). For the 2D oscillator, constants are found for the isotropic and anisotropic cases, including when the frequencies are incommensurate; it is also generalized to arbitrary dimensions. In addition, a constant is identified which generalizes angular momentum for all ratios of the frequencies. The approach presented here can produce {\em multiple} constants of motion from a {\em single}, generic data set.
△ Less
Submitted 31 January, 2025; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Neural Parameter Regression for Explicit Representations of PDE Solution Operators
Authors:
Konrad Mundinger,
Max Zimmer,
Sebastian Pokutta
Abstract:
We introduce Neural Parameter Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs). Tailored for operator learning, this approach surpasses traditional DeepONets (Lu et al., 2021) by employing Physics-Informed Neural Network (PINN, Raissi et al., 2019) techniques to regress Neural Network (NN) parameters. By parametrizi…
▽ More
We introduce Neural Parameter Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs). Tailored for operator learning, this approach surpasses traditional DeepONets (Lu et al., 2021) by employing Physics-Informed Neural Network (PINN, Raissi et al., 2019) techniques to regress Neural Network (NN) parameters. By parametrizing each solution based on specific initial conditions, it effectively approximates a mapping between function spaces. Our method enhances parameter efficiency by incorporating low-rank matrices, thereby boosting computational efficiency and scalability. The framework shows remarkable adaptability to new initial and boundary conditions, allowing for rapid fine-tuning and inference, even in cases of out-of-distribution examples.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
On the Byzantine-Resilience of Distillation-Based Federated Learning
Authors:
Christophe Roux,
Max Zimmer,
Sebastian Pokutta
Abstract:
Federated Learning (FL) algorithms using Knowledge Distillation (KD) have received increasing attention due to their favorable properties with respect to privacy, non-i.i.d. data and communication cost. These methods depart from transmitting model parameters and instead communicate information about a learning task by sharing predictions on a public dataset. In this work, we study the performance…
▽ More
Federated Learning (FL) algorithms using Knowledge Distillation (KD) have received increasing attention due to their favorable properties with respect to privacy, non-i.i.d. data and communication cost. These methods depart from transmitting model parameters and instead communicate information about a learning task by sharing predictions on a public dataset. In this work, we study the performance of such approaches in the byzantine setting, where a subset of the clients act in an adversarial manner aiming to disrupt the learning process. We show that KD-based FL algorithms are remarkably resilient and analyze how byzantine clients can influence the learning process. Based on these insights, we introduce two new byzantine attacks and demonstrate their ability to break existing byzantine-resilient methods. Additionally, we propose a novel defence method which enhances the byzantine resilience of KD-based FL algorithms. Finally, we provide a general framework to obfuscate attacks, making them significantly harder to detect, thereby improving their effectiveness. Our findings serve as an important building block in the analysis of byzantine FL, contributing through the development of new attacks and new defence mechanisms, further advancing the robustness of KD-based FL algorithms.
△ Less
Submitted 17 March, 2025; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control
Authors:
Zheng Xiong,
Risto Vuorio,
Jacob Beck,
Matthieu Zimmer,
Kun Shao,
Shimon Whiteson
Abstract:
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good per…
▽ More
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.
△ Less
Submitted 3 June, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Authors:
Max Zimmer,
Megi Andoni,
Christoph Spiegel,
Sebastian Pokutta
Abstract:
Neural Networks can be effectively compressed through pruning, significantly reducing storage and compute demands while maintaining predictive performance. Simple yet effective methods like magnitude pruning remove less important parameters and typically require a costly retraining procedure to restore performance. However, with the rise of LLMs, full retraining has become infeasible due to memory…
▽ More
Neural Networks can be effectively compressed through pruning, significantly reducing storage and compute demands while maintaining predictive performance. Simple yet effective methods like magnitude pruning remove less important parameters and typically require a costly retraining procedure to restore performance. However, with the rise of LLMs, full retraining has become infeasible due to memory and compute constraints. This study challenges the practice of retraining all parameters by showing that updating a small subset of highly expressive parameters can suffice to recover or even enhance performance after pruning. Surprisingly, retraining just 0.01%-0.05% of the parameters in GPT-architectures can match the performance of full retraining across various sparsity levels, significantly reducing compute and memory requirements, and enabling retraining of models with up to 30 billion parameters on a single GPU in minutes. To bridge the gap to full retraining in the high sparsity regime, we introduce two novel LoRA variants that, unlike standard LoRA, allow merging adapters back without compromising sparsity. Going a step further, we show that these methods can be applied for memory-efficient layer-wise reconstruction, significantly enhancing state-of-the-art retraining-free methods like Wanda (Sun et al., 2023) and SparseGPT (Frantar & Alistarh, 2023). Our findings present a promising alternative to avoiding retraining.
△ Less
Submitted 5 February, 2025; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Authors:
Filippos Christianos,
Georgios Papoudakis,
Matthieu Zimmer,
Thomas Coste,
Zhihao Wu,
Jingxuan Chen,
Khyati Khandelwal,
James Doran,
Xidong Feng,
Jiacheng Liu,
Zheng Xiong,
Yicheng Luo,
Jianye Hao,
Kun Shao,
Haitham Bou-Ammar,
Jun Wang
Abstract:
A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information…
▽ More
A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis
Authors:
Philip John Gorinski,
Matthieu Zimmer,
Gerasimos Lampouras,
Derrick Goh Xin Deik,
Ignacio Iacobacci
Abstract:
The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a Language Modelling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through…
▽ More
The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a Language Modelling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Authors:
Max Zimmer,
Christoph Spiegel,
Sebastian Pokutta
Abstract:
Neural networks can be significantly compressed by pruning, yielding sparse models with reduced storage and computational demands while preserving predictive performance. Model soups (Wortsman et al., 2022) enhance generalization and out-of-distribution (OOD) performance by averaging the parameters of multiple models into a single one, without increasing inference time. However, achieving both spa…
▽ More
Neural networks can be significantly compressed by pruning, yielding sparse models with reduced storage and computational demands while preserving predictive performance. Model soups (Wortsman et al., 2022) enhance generalization and out-of-distribution (OOD) performance by averaging the parameters of multiple models into a single one, without increasing inference time. However, achieving both sparsity and parameter averaging is challenging as averaging arbitrary sparse models reduces the overall sparsity due to differing sparse connectivities. This work addresses these challenges by demonstrating that exploring a single retraining phase of Iterative Magnitude Pruning (IMP) with varied hyperparameter configurations such as batch ordering or weight decay yields models suitable for averaging, sharing identical sparse connectivity by design. Averaging these models significantly enhances generalization and OOD performance over their individual counterparts. Building on this, we introduce Sparse Model Soups (SMS), a novel method for merging sparse models by initiating each prune-retrain cycle with the averaged model from the previous phase. SMS preserves sparsity, exploits sparse network benefits, is modular and fully parallelizable, and substantially improves IMP's performance. We further demonstrate that SMS can be adapted to enhance state-of-the-art pruning-during-training approaches.
△ Less
Submitted 23 March, 2024; v1 submitted 29 June, 2023;
originally announced June 2023.
-
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes
Authors:
Alexandre Maraval,
Matthieu Zimmer,
Antoine Grosnit,
Haitham Bou Ammar
Abstract:
Meta-Bayesian optimisation (meta-BO) aims to improve the sample efficiency of Bayesian optimisation by leveraging data from related tasks. While previous methods successfully meta-learn either a surrogate model or an acquisition function independently, joint training of both components remains an open challenge. This paper proposes the first end-to-end differentiable meta-BO framework that general…
▽ More
Meta-Bayesian optimisation (meta-BO) aims to improve the sample efficiency of Bayesian optimisation by leveraging data from related tasks. While previous methods successfully meta-learn either a surrogate model or an acquisition function independently, joint training of both components remains an open challenge. This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data. Early on, we notice that training transformer-based neural processes from scratch with RL is challenging due to insufficient supervision, especially when rewards are sparse. We formalise this claim with a combinatorial analysis showing that the widely used notion of regret as a reward signal exhibits a logarithmic sparsity pattern in trajectory lengths. To tackle this problem, we augment the RL objective with an auxiliary task that guides part of the architecture to learn a valid probabilistic model as an inductive bias. We demonstrate that our method achieves state-of-the-art regret results against various baselines in experiments on standard hyperparameter optimisation tasks and also outperforms others in the real-world problems of mixed-integer programming tuning, antibody design, and logic synthesis for electronic design automation.
△ Less
Submitted 22 December, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Interpretability Guarantees with Merlin-Arthur Classifiers
Authors:
Stephan Wäldchen,
Kartikey Sharma,
Berkant Turan,
Max Zimmer,
Sebastian Pokutta
Abstract:
We propose an interactive multi-agent classifier that provides provable interpretability guarantees even for complex agents such as neural networks. These guarantees consist of lower bounds on the mutual information between selected features and the classification decision. Our results are inspired by the Merlin-Arthur protocol from Interactive Proof Systems and express these bounds in terms of me…
▽ More
We propose an interactive multi-agent classifier that provides provable interpretability guarantees even for complex agents such as neural networks. These guarantees consist of lower bounds on the mutual information between selected features and the classification decision. Our results are inspired by the Merlin-Arthur protocol from Interactive Proof Systems and express these bounds in terms of measurable metrics such as soundness and completeness. Compared to existing interactive setups, we rely neither on optimal agents nor on the assumption that features are distributed independently. Instead, we use the relative strength of the agents as well as the new concept of Asymmetric Feature Correlation which captures the precise kind of correlations that make interpretability guarantees difficult. We evaluate our results on two small-scale datasets where high mutual information can be verified explicitly.
△ Less
Submitted 22 March, 2024; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Sample-Efficient Optimisation with Probabilistic Transformer Surrogates
Authors:
Alexandre Maraval,
Matthieu Zimmer,
Antoine Grosnit,
Rasul Tutunov,
Jun Wang,
Haitham Bou Ammar
Abstract:
Faced with problems of increasing complexity, recent research in Bayesian Optimisation (BO) has focused on adapting deep probabilistic models as flexible alternatives to Gaussian Processes (GPs). In a similar vein, this paper investigates the feasibility of employing state-of-the-art probabilistic transformers in BO. Upon further investigation, we observe two drawbacks stemming from their training…
▽ More
Faced with problems of increasing complexity, recent research in Bayesian Optimisation (BO) has focused on adapting deep probabilistic models as flexible alternatives to Gaussian Processes (GPs). In a similar vein, this paper investigates the feasibility of employing state-of-the-art probabilistic transformers in BO. Upon further investigation, we observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. First, we notice that these models are trained on uniformly distributed inputs, which impairs predictive accuracy on non-uniform data - a setting arising from any typical BO loop due to exploration-exploitation trade-offs. Second, we realise that training losses (e.g., cross-entropy) only asymptotically guarantee accurate posterior approximations, i.e., after arriving at the global optimum, which generally cannot be ensured. At the stationary points of the loss function, however, we observe a degradation in predictive performance especially in exploratory regions of the input space. To tackle these shortcomings we introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance. In a large panel of experiments, we demonstrate, for the first time, that one transformer pre-trained on data sampled from random GP priors produces competitive results on 16 benchmark black-boxes compared to GP-based BO. Since our model is only pre-trained once and used in all tasks without any retraining and/or fine-tuning, we report an order of magnitude time-reduction, while matching and sometimes outperforming GPs.
△ Less
Submitted 30 May, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Compression-aware Training of Neural Networks using Frank-Wolfe
Authors:
Max Zimmer,
Christoph Spiegel,
Sebastian Pokutta
Abstract:
Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework c…
▽ More
Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards convolutional filter pruning and low-rank matrix decomposition. Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. (2020), is crucial for convergence and robustness of SFW-trained models and we establish a theoretical foundation for that practice.
△ Less
Submitted 14 February, 2024; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Neuro-Symbolic Hierarchical Rule Induction
Authors:
Claire Glanois,
Xuening Feng,
Zhaohui Jiang,
Paul Weng,
Matthieu Zimmer,
Dong Li,
Wulong Liu
Abstract:
We propose an efficient interpretable neuro-symbolic model to solve Inductive Logic Programming (ILP) problems. In this model, which is built from a set of meta-rules organised in a hierarchical structure, first-order rules are invented by learning embeddings to match facts and body predicates of a meta-rule. To instantiate it, we specifically design an expressive set of generic meta-rules, and de…
▽ More
We propose an efficient interpretable neuro-symbolic model to solve Inductive Logic Programming (ILP) problems. In this model, which is built from a set of meta-rules organised in a hierarchical structure, first-order rules are invented by learning embeddings to match facts and body predicates of a meta-rule. To instantiate it, we specifically design an expressive set of generic meta-rules, and demonstrate they generate a consequent fragment of Horn clauses. During training, we inject a controlled \pw{Gumbel} noise to avoid local optima and employ interpretability-regularization term to further guide the convergence to interpretable rules. We empirically validate our model on various tasks (ILP, visual genome, reinforcement learning) against several state-of-the-art methods.
△ Less
Submitted 26 December, 2021;
originally announced December 2021.
-
A Survey on Interpretable Reinforcement Learning
Authors:
Claire Glanois,
Paul Weng,
Matthieu Zimmer,
Dong Li,
Tianpei Yang,
Jianye Hao,
Wulong Liu
Abstract:
Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons…
▽ More
Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions.
△ Less
Submitted 24 February, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Bitemporal Property Graphs to Organize Evolving Systems
Authors:
Christopher Rost,
Philip Fritzsche,
Lucas Schons,
Maximilian Zimmer,
Dieter Gawlick,
Erhard Rahm
Abstract:
This work is a summarized view on the results of a one-year cooperation between Oracle Corp. and the University of Leipzig. The goal was to research the organization of relationships within multi-dimensional time-series data, such as sensor data from the IoT area. We showed in this project that temporal property graphs with some extensions are a prime candidate for this organizational task that co…
▽ More
This work is a summarized view on the results of a one-year cooperation between Oracle Corp. and the University of Leipzig. The goal was to research the organization of relationships within multi-dimensional time-series data, such as sensor data from the IoT area. We showed in this project that temporal property graphs with some extensions are a prime candidate for this organizational task that combines the strengths of both data models (graph and time-series). The outcome of the cooperation includes four achievements: (1) a bitemporal property graph model, (2) a temporal graph query language, (3) a conception of continuous event detection, and (4) a prototype of a bitemporal graph database that supports the model, language and event detection.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
How I Learned to Stop Worrying and Love Retraining
Authors:
Max Zimmer,
Christoph Spiegel,
Sebastian Pokutta
Abstract:
Many Neural Network Pruning approaches consist of several iterative training and pruning steps, seemingly losing a significant amount of their performance after pruning and then recovering it in the subsequent retraining phase. Recent works of Renda et al. (2020) and Le & Hua (2021) demonstrate the significance of the learning rate schedule during the retraining phase and propose specific heuristi…
▽ More
Many Neural Network Pruning approaches consist of several iterative training and pruning steps, seemingly losing a significant amount of their performance after pruning and then recovering it in the subsequent retraining phase. Recent works of Renda et al. (2020) and Le & Hua (2021) demonstrate the significance of the learning rate schedule during the retraining phase and propose specific heuristics for choosing such a schedule for IMP (Han et al., 2015). We place these findings in the context of the results of Li et al. (2020) regarding the training of models within a fixed training budget and demonstrate that, consequently, the retraining phase can be massively shortened using a simple linear learning rate schedule. Improving on existing retraining approaches, we additionally propose a method to adaptively select the initial value of the linear schedule. Going a step further, we propose similarly imposing a budget on the initial dense training phase and show that the resulting simple and efficient method is capable of outperforming significantly more complex or heavily parameterized state-of-the-art approaches that attempt to sparsify the network during training. These findings not only advance our understanding of the retraining phase, but more broadly question the belief that one should aim to avoid the need for retraining and reduce the negative effects of 'hard' pruning by incorporating the sparsification process into the standard training.
△ Less
Submitted 12 March, 2023; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Extracting Dynamical Models from Data
Authors:
Michael F. Zimmer
Abstract:
The problem of determining the underlying dynamics of a system when only given data of its state over time has challenged scientists for decades. In this paper, the approach of using machine learning to model the updates of the phase space variables is introduced; this is done as a function of the phase space variables. (More generally, the modeling is done over functions of the jet space.) This a…
▽ More
The problem of determining the underlying dynamics of a system when only given data of its state over time has challenged scientists for decades. In this paper, the approach of using machine learning to model the updates of the phase space variables is introduced; this is done as a function of the phase space variables. (More generally, the modeling is done over functions of the jet space.) This approach (named FJet) allows one to accurately replicate the dynamics, and is demonstrated on the examples of the damped harmonic oscillator, the damped pendulum, and the Duffing oscillator; the underlying differential equation is also accurately recovered for each example. In addition, the results in no way depend on how the data is sampled over time (i.e., regularly or irregularly). It is demonstrated that a regression implementation of FJet is similar to the model resulting from a Taylor series expansion of the Runge-Kutta (RK) numerical integration scheme. This identification confers the advantage of explicitly revealing the function space to use in the modeling, as well as the associated uncertainty quantification for the updates. Finally, it is shown in the undamped harmonic oscillator example that the stability of the updates is stable $10^9$ times longer than with $4$th-order RK (with time step $0.1$).
△ Less
Submitted 31 January, 2024; v1 submitted 13 October, 2021;
originally announced October 2021.
-
2nd-order Updates with 1st-order Complexity
Authors:
Michael F. Zimmer
Abstract:
It has long been a goal to efficiently compute and use second order information on a function ($f$) to assist in numerical approximations. Here it is shown how, using only basic physics and a numerical approximation, such information can be accurately obtained at a cost of ${\cal O}(N)$ complexity, where $N$ is the dimensionality of the parameter space of $f$. In this paper, an algorithm ({\em VA-…
▽ More
It has long been a goal to efficiently compute and use second order information on a function ($f$) to assist in numerical approximations. Here it is shown how, using only basic physics and a numerical approximation, such information can be accurately obtained at a cost of ${\cal O}(N)$ complexity, where $N$ is the dimensionality of the parameter space of $f$. In this paper, an algorithm ({\em VA-Flow}) is developed to exploit this second order information, and pseudocode is presented. It is applied to two classes of problems, that of inverse kinematics (IK) and gradient descent (GD). In the IK application, the algorithm is fast and robust, and is shown to lead to smooth behavior even near singularities. For GD the algorithm also works very well, provided the cost function is locally well-described by a polynomial.
△ Less
Submitted 27 May, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Differentiable Logic Machines
Authors:
Matthieu Zimmer,
Xuening Feng,
Claire Glanois,
Zhaohui Jiang,
Jianyi Zhang,
Paul Weng,
Dong Li,
Jianye Hao,
Wulong Liu
Abstract:
The integration of reasoning, learning, and decision-making is key to build more general artificial intelligence systems. As a step in this direction, we propose a novel neural-logic architecture, called differentiable logic machine (DLM), that can solve both inductive logic programming (ILP) and reinforcement learning (RL) problems, where the solution can be interpreted as a first-order logic pro…
▽ More
The integration of reasoning, learning, and decision-making is key to build more general artificial intelligence systems. As a step in this direction, we propose a novel neural-logic architecture, called differentiable logic machine (DLM), that can solve both inductive logic programming (ILP) and reinforcement learning (RL) problems, where the solution can be interpreted as a first-order logic program. Our proposition includes several innovations. Firstly, our architecture defines a restricted but expressive continuous relaxation of the space of first-order logic programs by assigning weights to predicates instead of rules, in contrast to most previous neural-logic approaches. Secondly, with this differentiable architecture, we propose several (supervised and RL) training procedures, based on gradient descent, which can recover a fully-interpretable solution (i.e., logic formula). Thirdly, to accelerate RL training, we also design a novel critic architecture that enables actor-critic algorithms. Fourthly, to solve hard problems, we propose an incremental training procedure that can learn a logic program progressively. Compared to state-of-the-art (SOTA) differentiable ILP methods, DLM successfully solves all the considered ILP problems with a higher percentage of successful seeds (up to 3.5$\times$). On RL problems, without requiring an interpretable solution, DLM outperforms other non-interpretable neural-logic RL approaches in terms of rewards (up to 3.9%). When enforcing interpretability, DLM can solve harder RL problems (e.g., Sorting, Path) Moreover, we show that deep logic programs can be learned via incremental supervised training. In addition to this excellent performance, DLM can scale well in terms of memory and computational time, especially during the testing phase where it can deal with much more constants ($>$2$\times$) than SOTA.
△ Less
Submitted 5 July, 2023; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning
Authors:
Matthieu Zimmer,
Claire Glanois,
Umer Siddique,
Paul Weng
Abstract:
We consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement learning (MARL). We formalize it in a principled way as the problem of optimizing a welfare function that explicitly encodes two important aspects of fairness: efficiency and equity. As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specificall…
▽ More
We consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement learning (MARL). We formalize it in a principled way as the problem of optimizing a welfare function that explicitly encodes two important aspects of fairness: efficiency and equity. As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account the two aspects of fairness. In experiments, we demonstrate the importance of the two sub-networks for fair optimization. Our overall approach is general as it can accommodate any (sub)differentiable welfare function. Therefore, it is compatible with various notions of fairness that have been proposed in the literature (e.g., lexicographic maximin, generalized Gini social welfare function, proportional fairness). Our solution method is generic and can be implemented in various MARL settings: centralized training and decentralized execution, or fully decentralized. Finally, we experimentally validate our approach in various domains and show that it can perform much better than previous methods.
△ Less
Submitted 22 June, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
Authors:
Jiancong Huang,
Juan Rojas,
Matthieu Zimmer,
Hongmin Wu,
Yisheng Guan,
Paul Weng
Abstract:
Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resource…
▽ More
Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. The effects are further exacerbated when using single policies to solve multi-task learning problems. Observing that the Evidence Lower Bound (ELBO) used in Variational Auto-Encoders correlates with the diversity of image samples, we propose an auto-tuning technique based on the ELBO for self-supervised reinforcement learning. Our approach can auto-tune three hyperparameters: the replay buffer size, the number of policy gradient updates during each epoch, and the number of exploration steps during each epoch. We use a state-of-the-art self-supervised robot learning framework (Reinforcement Learning with Imagined Goals (RIG) using Soft Actor-Critic) as baseline for experimental verification. Experiments show that our method can auto-tune online and yields the best performance at a fraction of the time and computational resources. Code, video, and appendix for simulated and real-robot experiments can be found at the project page \url{www.JuanRojas.net/autotune}.
△ Less
Submitted 24 March, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Neograd: Near-Ideal Gradient Descent
Authors:
Michael F. Zimmer
Abstract:
The purpose of this paper is to improve upon existing variants of gradient descent by solving two problems: (1) removing (or reducing) the plateau that occurs while minimizing the cost function, (2) continually adjusting the learning rate to an "ideal" value. The approach taken is to approximately solve for the learning rate as a function of a trust metric. When this technique is hybridized with m…
▽ More
The purpose of this paper is to improve upon existing variants of gradient descent by solving two problems: (1) removing (or reducing) the plateau that occurs while minimizing the cost function, (2) continually adjusting the learning rate to an "ideal" value. The approach taken is to approximately solve for the learning rate as a function of a trust metric. When this technique is hybridized with momentum, it creates an especially effective gradient descent variant, called NeogradM. It is shown to outperform Adam on several test problems, and can easily reach cost function values that are smaller by a factor of $10^8$, for example.
△ Less
Submitted 2 August, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Deep Neural Network Training with Frank-Wolfe
Authors:
Sebastian Pokutta,
Christoph Spiegel,
Max Zimmer
Abstract:
This paper studies the empirical efficacy and benefits of using projection-free first-order methods in the form of Conditional Gradients, a.k.a. Frank-Wolfe methods, for training Neural Networks with constrained parameters. We draw comparisons both to current state-of-the-art stochastic Gradient Descent methods as well as across different variants of stochastic Conditional Gradients. In particular…
▽ More
This paper studies the empirical efficacy and benefits of using projection-free first-order methods in the form of Conditional Gradients, a.k.a. Frank-Wolfe methods, for training Neural Networks with constrained parameters. We draw comparisons both to current state-of-the-art stochastic Gradient Descent methods as well as across different variants of stochastic Conditional Gradients. In particular, we show the general feasibility of training Neural Networks whose parameters are constrained by a convex feasible region using Frank-Wolfe algorithms and compare different stochastic variants. We then show that, by choosing an appropriate region, one can achieve performance exceeding that of unconstrained stochastic Gradient Descent and matching state-of-the-art results relying on $L^2$-regularization. Lastly, we also demonstrate that, besides impacting performance, the particular choice of constraints can have a drastic impact on the learned representations.
△ Less
Submitted 21 October, 2020; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards
Authors:
Umer Siddique,
Paul Weng,
Matthieu Zimmer
Abstract:
As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the problem of learning a policy that treats its users equitably. In this paper, we formulate this novel RL problem, in which an objective function, which encodes a not…
▽ More
As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the problem of learning a policy that treats its users equitably. In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness that we formally define, is optimized. For this problem, we provide a theoretical discussion where we examine the case of discounted rewards and that of average rewards. During this analysis, we notably derive a new result in the standard RL setting, which is of independent interest: it states a novel bound on the approximation error with respect to the optimal average reward of that of a policy optimal for the discounted reward. Since learning with discounted rewards is generally easier, this discussion further justifies finding a fair policy for the average reward by learning a fair policy for the discounted reward. Thus, we describe how several classic deep RL algorithms can be adapted to our fair optimization problem, and we validate our approach with extensive experiments in three different domains.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation
Authors:
Yijiong Lin,
Jiancong Huang,
Matthieu Zimmer,
Juan Rojas,
Paul Weng
Abstract:
Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. We propose two novel data augmentation techniques for DRL in order to reuse more efficiently observed data. The first one called Kaleidoscope Experience Replay exploits reflectional symmetries, while the second called Goal-…
▽ More
Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. We propose two novel data augmentation techniques for DRL in order to reuse more efficiently observed data. The first one called Kaleidoscope Experience Replay exploits reflectional symmetries, while the second called Goal-augmented Experience Replay takes advantage of lax goal definitions. Our preliminary experimental results show a large increase in learning speed.
△ Less
Submitted 15 November, 2019; v1 submitted 18 October, 2019;
originally announced October 2019.
-
Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning
Authors:
Yijiong Lin,
Jiancong Huang,
Matthieu Zimmer,
Yisheng Guan,
Juan Rojas,
Paul Weng
Abstract:
Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and…
▽ More
Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques: (i) Kaleidoscope Experience Replay exploits reflectional symmetries and (ii) Goal-augmented Experience Replay which takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain a 13, 3, and 5 times speedup in the pushing, sliding, and pick-and-place tasks respectively in the multi-goal setting. Performance gains are also observed in similar tasks with obstacles and we successfully deployed a trained policy on a real Baxter robot. Our work demonstrates that invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL.
△ Less
Submitted 4 July, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.
-
Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains
Authors:
Matthieu Zimmer,
Paul Weng
Abstract:
In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically,…
▽ More
In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies.
△ Less
Submitted 24 June, 2019; v1 submitted 9 June, 2019;
originally announced June 2019.
-
Searching for Biophysically Realistic Parameters for Dynamic Neuron Models by Genetic Algorithms from Calcium Imaging Recording
Authors:
Magdalena Fuchs,
Manuel Zimmer,
Radu Grosu,
Ramin M. Hasani
Abstract:
Individual Neurons in the nervous systems exploit various dynamics. To capture these dynamics for single neurons, we tune the parameters of an electrophysiological model of nerve cells, to fit experimental data obtained by calcium imaging. A search for the biophysical parameters of this model is performed by means of a genetic algorithm, where the model neuron is exposed to a predefined input curr…
▽ More
Individual Neurons in the nervous systems exploit various dynamics. To capture these dynamics for single neurons, we tune the parameters of an electrophysiological model of nerve cells, to fit experimental data obtained by calcium imaging. A search for the biophysical parameters of this model is performed by means of a genetic algorithm, where the model neuron is exposed to a predefined input current representing overall inputs from other parts of the nervous system. The algorithm is then constrained for keeping the ion-channel currents within reasonable ranges, while producing the best fit to a calcium imaging time series of the AVA interneuron, from the brain of the soil-worm, C. elegans. Our settings enable us to project a set of biophysical parameters to the the neuron kinetics observed in neuronal imaging.
△ Less
Submitted 4 November, 2017;
originally announced November 2017.
-
Speedup from a different parametrization within the Neural Network algorithm
Authors:
Michael F. Zimmer
Abstract:
A different parametrization of the hyperplanes is used in the neural network algorithm. As demonstrated on several autoencoder examples it significantly outperforms the usual parametrization, reaching lower training error values with only a fraction of the number of epochs. It's argued that it makes it easier to understand and initialize the parameters.
A different parametrization of the hyperplanes is used in the neural network algorithm. As demonstrated on several autoencoder examples it significantly outperforms the usual parametrization, reaching lower training error values with only a fraction of the number of epochs. It's argued that it makes it easier to understand and initialize the parameters.
△ Less
Submitted 2 June, 2017; v1 submitted 19 May, 2017;
originally announced May 2017.