Kalman Bayesian Transformer

Jing, Haoming; Wright, Oren; Moura, José M. F.; Nakahira, Yorie

Computer Science > Machine Learning

arXiv:2509.10695 (cs)

[Submitted on 12 Sep 2025]

Title:Kalman Bayesian Transformer

Authors:Haoming Jing, Oren Wright, José M. F. Moura, Yorie Nakahira

View PDF HTML (experimental)

Abstract:Sequential fine-tuning of transformers is useful when new data arrive sequentially, especially with shifting distributions. Unlike batch learning, sequential learning demands that training be stabilized despite a small amount of data by balancing new information and previously learned knowledge in the pre-trained models. This challenge is further complicated when training is to be completed in latency-critical environments and learning must additionally quantify and be mediated by uncertainty. Motivated by these challenges, we propose a novel method that frames sequential fine-tuning as a posterior inference problem within a Bayesian framework. Our approach integrates closed-form moment propagation of random variables, Kalman Bayesian Neural Networks, and Taylor approximations of the moments of softmax functions. By explicitly accounting for pre-trained models as priors and adaptively balancing them against new information based on quantified uncertainty, our method achieves robust and data-efficient sequential learning. The effectiveness of our method is demonstrated through numerical simulations involving sequential adaptation of a decision transformer to tasks characterized by distribution shifts and limited memory resources.

Comments:	Accepted to the 64th IEEE Conference on Decision and Control (CDC 2025)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.10695 [cs.LG]
	(or arXiv:2509.10695v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2509.10695

Submission history

From: Haoming Jing [view email]
[v1] Fri, 12 Sep 2025 21:15:23 UTC (684 KB)

Computer Science > Machine Learning

Title:Kalman Bayesian Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Kalman Bayesian Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators