Skip to main content

Showing 1–3 of 3 results for author: Diao, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.18571  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

    Authors: Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong Zhang

    Abstract: Fine-grained control over large language models (LLMs) remains a significant challenge, hindering their adaptability to diverse user needs. While Reinforcement Learning from Human Feedback (RLHF) shows promise in aligning LLMs, its reliance on scalar rewards often limits its ability to capture diverse user preferences in real-world applications. To address this limitation, we introduce the Directi… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: The code and model are released at https://github.com/Haoxiang-Wang/directional-preference-alignment

  2. arXiv:2304.06767  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

    Authors: Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang

    Abstract: Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-worl… ▽ More

    Submitted 1 December, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 29 pages, 12 figures, Published in Transactions on Machine Learning Research (TMLR)

  3. arXiv:2211.11638  [pdf, other

    cs.LG stat.ML

    Normalizing Flow with Variational Latent Representation

    Authors: Hanze Dong, Shizhe Diao, Weizhong Zhang, Tong Zhang

    Abstract: Normalizing flow (NF) has gained popularity over traditional maximum likelihood based methods due to its strong capability to model complex data distributions. However, the standard approach, which maps the observed data to a normal distribution, has difficulty in handling data distributions with multiple relatively isolated modes. To overcome this issue, we propose a new framework based on variat… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 24 pages, 7 figures