T-POP: Test-Time Personalization with Online Preference Feedback

Qu, Zikun; Zhang, Min; Kong, Mingze; Li, Xiang; Shang, Zhiwei; Wang, Zhiyong; Ban, Yikun; Qiu, Shuang; Shu, Yao; Dai, Zhongxiang

Computer Science > Machine Learning

arXiv:2509.24696 (cs)

[Submitted on 29 Sep 2025]

Title:T-POP: Test-Time Personalization with Online Preference Feedback

Authors:Zikun Qu, Min Zhang, Mingze Kong, Xiang Li, Zhiwei Shang, Zhiyong Wang, Yikun Ban, Shuang Qiu, Yao Shu, Zhongxiang Dai

View PDF HTML (experimental)

Abstract:Personalizing large language models (LLMs) to individual user preferences is a critical step beyond generating generically helpful responses. However, current personalization methods are ill-suited for new users, as they typically require either slow, resource-intensive fine-tuning or a substantial amount of pre-existing user data, creating a significant cold-start problem. To address this challenge, we introduce a new paradigm for real-time personalization by learning from online pairwise preference feedback collected during text generation. We propose T-POP (Test-Time Personalization with Online Preference Feedback}), a novel algorithm that synergistically combines test-time alignment with dueling bandits. Without updating the LLM parameters, T-POP steers the decoding process of a frozen LLM by learning a reward function online that captures user preferences. By leveraging dueling bandits, T-POP intelligently queries the user to efficiently balance between exploring their preferences and exploiting the learned knowledge to generate personalized text. Extensive experiments demonstrate that T-POP achieves rapid and data-efficient personalization, significantly outperforming existing baselines and showing consistent improvement with more user interactions.

Comments:	Preprint
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.24696 [cs.LG]
	(or arXiv:2509.24696v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2509.24696

Submission history

From: Zhongxiang Dai [view email]
[v1] Mon, 29 Sep 2025 12:28:23 UTC (259 KB)

Computer Science > Machine Learning

Title:T-POP: Test-Time Personalization with Online Preference Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:T-POP: Test-Time Personalization with Online Preference Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators