Search | arXiv e-print repository

Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games

Abstract: Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon… ▽ More Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon linear-quadratic GMFGs. Exploiting the structural properties of GMFGs, we design an efficient policy parameterization in which each player's policy is represented as an affine function of their private state, with a shared slope function and player-specific intercepts. We develop a bilevel optimization algorithm that alternates between policy gradient updates for best-response computation under a fixed population distribution, and distribution updates using the resulting policies. We prove linear convergence of the policy gradient steps to best-response policies and establish global convergence of the overall algorithm to the Nash equilibrium. The analysis relies on novel landscape characterizations over infinite-dimensional policy spaces. Numerical experiments demonstrate the convergence and robustness of the proposed algorithm under varying graphon structures, noise levels, and action frequencies. △ Less

Submitted 6 June, 2025; originally announced June 2025.

MSC Class: 68Q25; 91A15; 49N80; 91A07; 91A43; 49N10

arXiv:2409.15022 [pdf, other]

A Diagonal Structured State Space Model on Loihi 2 for Efficient Streaming Sequence Processing

Authors: Svea Marie Meyer, Philipp Weidel, Philipp Plank, Leobardo Campos-Macias, Sumit Bam Shrestha, Philipp Stratmann, Mathis Richter

Abstract: Deep State-Space Models (SSM) demonstrate state-of-the art performance on long-range sequence modeling tasks. While the recurrent structure of SSMs can be efficiently implemented as a convolution or as a parallel scan during training, recurrent token-by-token processing cannot currently be implemented efficiently on GPUs. Here, we demonstrate efficient token-by-token inference of the SSM S4D on In… ▽ More Deep State-Space Models (SSM) demonstrate state-of-the art performance on long-range sequence modeling tasks. While the recurrent structure of SSMs can be efficiently implemented as a convolution or as a parallel scan during training, recurrent token-by-token processing cannot currently be implemented efficiently on GPUs. Here, we demonstrate efficient token-by-token inference of the SSM S4D on Intel's Loihi 2 state-of-the-art neuromorphic processor. We compare this first ever neuromorphic-hardware implementation of an SSM on sMNIST, psMNIST, and sCIFAR to a recurrent and a convolutional implementation of S4D on Jetson Orin Nano (Jetson). While we find Jetson to perform better in an offline sample-by-sample based batched processing mode, Loihi 2 outperforms during token-by-token based processing, where it consumes 1000 times less energy with a 75 times lower latency and a 75 times higher throughput compared to the recurrent implementation of S4D on Jetson. This opens up new avenues towards efficient real-time streaming applications of SSMs. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 6 pages, 2 figures

arXiv:2107.03992 [pdf, other]

A Long Short-Term Memory for AI Applications in Spike-based Neuromorphic Hardware

Authors: Philipp Plank, Arjun Rao, Andreas Wild, Wolfgang Maass

Abstract: Spike-based neuromorphic hardware holds the promise to provide more energy efficient implementations of Deep Neural Networks (DNNs) than standard hardware such as GPUs. But this requires to understand how DNNs can be emulated in an event-based sparse firing regime, since otherwise the energy-advantage gets lost. In particular, DNNs that solve sequence processing tasks typically employ Long Short-T… ▽ More Spike-based neuromorphic hardware holds the promise to provide more energy efficient implementations of Deep Neural Networks (DNNs) than standard hardware such as GPUs. But this requires to understand how DNNs can be emulated in an event-based sparse firing regime, since otherwise the energy-advantage gets lost. In particular, DNNs that solve sequence processing tasks typically employ Long Short-Term Memory (LSTM) units that are hard to emulate with few spikes. We show that a facet of many biological neurons, slow after-hyperpolarizing (AHP) currents after each spike, provides an efficient solution. AHP-currents can easily be implemented in neuromorphic hardware that supports multi-compartment neuron models, such as Intel's Loihi chip. Filter approximation theory explains why AHP-neurons can emulate the function of LSTM units. This yields a highly energy-efficient approach to time series classification. Furthermore it provides the basis for implementing with very sparse firing an important class of large DNNs that extract relations between words and sentences in a text in order to answer questions about the text. △ Less

Submitted 7 November, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: Philipp Plank and Arjun Rao have contributed equally to this work as first authors

Showing 1–3 of 3 results for author: Plank, P