AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Liu, Qi; Ruan, Jingqing; Li, Hao; Zhao, Haodong; Wang, Desheng; Chen, Jiansong; Guanglu, Wan; Cai, Xunliang; Zheng, Zhi; Xu, Tong

Computer Science > Machine Learning

arXiv:2506.07165 (cs)

[Submitted on 8 Jun 2025]

Title:AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Authors:Qi Liu, Jingqing Ruan, Hao Li, Haodong Zhao, Desheng Wang, Jiansong Chen, Wan Guanglu, Xunliang Cai, Zhi Zheng, Tong Xu

View PDF HTML (experimental)

Abstract:Existing multi-objective preference alignment methods for large language models (LLMs) face limitations: (1) the inability to effectively balance various preference dimensions, and (2) reliance on auxiliary reward/reference models introduces computational complexity. To address these challenges, we propose Adaptive Multi-objective Preference Optimization (AMoPO), a novel framework that achieves dynamic balance across preference dimensions. By introducing the multi-objective optimization paradigm to use the dimension-aware generation metrics as implicit rewards, AMoPO aligns LLMs with diverse preferences without additional reward models or reference models. We introduce an adaptive weight assignment mechanism that models the generation space as a Gaussian distribution, allowing dynamic prioritization of preference dimensions. Empirical results demonstrate that AMoPO outperforms state-of-the-art baselines by 28.5%, and the experiments on 7B, 14B, and 32B models reveal the scaling ability of AMoPO. Moreover, additional analysis of multiple dimensions verifies its adaptability and effectiveness. These findings validate AMoPO's capability to achieve dimension-aware preference alignment, highlighting its superiority. Our codes and datasets are available at this https URL.

Comments:	Accepted by ACL 2025
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2506.07165 [cs.LG]
	(or arXiv:2506.07165v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2506.07165

Submission history

From: Liu Qi [view email]
[v1] Sun, 8 Jun 2025 14:31:06 UTC (13,134 KB)

Computer Science > Machine Learning

Title:AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators