Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens

Erba, Vittorio; Troiani, Emanuele; Biggio, Luca; Maillard, Antoine; Zdeborová, Lenka

Condensed Matter > Disordered Systems and Neural Networks

arXiv:2410.18858 (cond-mat)

[Submitted on 24 Oct 2024 (v1), last revised 21 May 2025 (this version, v2)]

Title:Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens

Authors:Vittorio Erba, Emanuele Troiani, Luca Biggio, Antoine Maillard, Lenka Zdeborová

View PDF HTML (experimental)

Abstract:Current progress in artificial intelligence is centered around so-called large language models that consist of neural networks processing long sequences of high-dimensional vectors called tokens. Statistical physics provides powerful tools to study the functioning of learning with neural networks and has played a recognized role in the development of modern machine learning. The statistical physics approach relies on simplified and analytically tractable models of data. However, simple tractable models for long sequences of high-dimensional tokens are largely underexplored. Inspired by the crucial role models such as the single-layer teacher-student perceptron (aka generalized linear regression) played in the theory of fully connected neural networks, in this paper, we introduce and study the bilinear sequence regression (BSR) as one of the most basic models for sequences of tokens. We note that modern architectures naturally subsume the BSR model due to the skip connections. Building on recent methodological progress, we compute the Bayes-optimal generalization error for the model in the limit of long sequences of high-dimensional tokens, and provide a message-passing algorithm that matches this performance. We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression. We also unveil surprising properties of the gradient descent algorithms in the BSR model.

Subjects:	Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
Cite as:	arXiv:2410.18858 [cond-mat.dis-nn]
	(or arXiv:2410.18858v2 [cond-mat.dis-nn] for this version)
	https://doi.org/10.48550/arXiv.2410.18858

Submission history

From: Vittorio Erba [view email]
[v1] Thu, 24 Oct 2024 15:44:03 UTC (2,104 KB)
[v2] Wed, 21 May 2025 14:57:11 UTC (1,849 KB)

Condensed Matter > Disordered Systems and Neural Networks

Title:Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Condensed Matter > Disordered Systems and Neural Networks

Title:Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators