CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs
Authors:
A. Chervov,
A. Soibelman,
S. Lytkin,
I. Kiselev,
S. Fironov,
A. Lukyanenko,
A. Dolgorukova,
A. Ogurtsov,
F. Petrov,
S. Krymskii,
M. Evseev,
L. Grunvald,
D. Gorodkov,
G. Antiufeev,
G. Verbii,
V. Zamkovoy,
L. Cheldieva,
I. Koltsov,
A. Sychev,
M. Obozov,
A. Eliseev,
S. Nikolenko,
N. Narynbaev,
R. Turtayev,
N. Rokotyan
, et al. (9 additional authors not shown)
Abstract:
This paper is the second in a series of studies on developing efficient artificial intelligence-based approaches to pathfinding on extremely large graphs (e.g. $10^{70}$ nodes) with a focus on Cayley graphs and mathematical applications. The open-source CayleyPy project is a central component of our research. The present paper proposes a novel combination of a reinforcement learning approach with…
▽ More
This paper is the second in a series of studies on developing efficient artificial intelligence-based approaches to pathfinding on extremely large graphs (e.g. $10^{70}$ nodes) with a focus on Cayley graphs and mathematical applications. The open-source CayleyPy project is a central component of our research. The present paper proposes a novel combination of a reinforcement learning approach with a more direct diffusion distance approach from the first paper. Our analysis includes benchmarking various choices for the key building blocks of the approach: architectures of the neural network, generators for the random walks and beam search pathfinding. We compared these methods against the classical computer algebra system GAP, demonstrating that they "overcome the GAP" for the considered examples. As a particular mathematical application we examine the Cayley graph of the symmetric group with cyclic shift and transposition generators. We provide strong support for the OEIS-A186783 conjecture that the diameter is equal to n(n-1)/2 by machine learning and mathematical methods. We identify the conjectured longest element and generate its decomposition of the desired length. We prove a diameter lower bound of n(n-1)/2-n/2 and an upper bound of n(n-1)/2+ 3n by presenting the algorithm with given complexity. We also present several conjectures motivated by numerical experiments, including observations on the central limit phenomenon (with growth approximated by a Gumbel distribution), the uniform distribution for the spectrum of the graph, and a numerical study of sorting networks. To stimulate crowdsourcing activity, we create challenges on the Kaggle platform and invite contributions to improve and benchmark approaches on Cayley graph pathfinding and other tasks.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
Fast Inference of Mixture-of-Experts Language Models with Offloading
Authors:
Artyom Eliseev,
Denis Mazur
Abstract:
With the widespread adoption of Large Language Models (LLMs), many deep learning practitioners are looking for strategies of running these models more efficiently. One such strategy is to use sparse Mixture-of-Experts (MoE) - a type of model architectures where only a fraction of model layers are active for any given input. This property allows MoE-based language models to generate tokens faster t…
▽ More
With the widespread adoption of Large Language Models (LLMs), many deep learning practitioners are looking for strategies of running these models more efficiently. One such strategy is to use sparse Mixture-of-Experts (MoE) - a type of model architectures where only a fraction of model layers are active for any given input. This property allows MoE-based language models to generate tokens faster than their dense counterparts, but it also increases model size due to having multiple experts. Unfortunately, this makes state-of-the-art MoE language models difficult to run without high-end GPUs. In this work, we study the problem of running large MoE language models on consumer hardware with limited accelerator memory. We build upon parameter offloading algorithms and propose a novel strategy that accelerates offloading by taking advantage of innate properties of MoE LLMs. Using this strategy, we build can run Mixtral-8x7B with mixed quantization on desktop hardware and free-tier Google Colab instances.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.