SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Ding, Mucong; Chakraborty, Souradip; Agrawal, Vibhu; Che, Zora; Koppel, Alec; Wang, Mengdi; Bedi, Amrit; Huang, Furong

Computer Science > Machine Learning

arXiv:2406.15567v1 (cs)

[Submitted on 21 Jun 2024]

Title:SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Authors:Mucong Ding, Souradip Chakraborty, Vibhu Agrawal, Zora Che, Alec Koppel, Mengdi Wang, Amrit Bedi, Furong Huang

View PDF HTML (experimental)

Abstract:Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead.

Comments:	24 pages, 6 figures, 3 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:2406.15567 [cs.LG]
	(or arXiv:2406.15567v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.15567

Submission history

From: Mucong Ding [view email]
[v1] Fri, 21 Jun 2024 18:05:35 UTC (3,000 KB)

Computer Science > Machine Learning

Title:SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators