Skip to main content

Showing 1–1 of 1 results for author: Rogel-García, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00611  [pdf, ps, other

    cs.LG cs.AI cs.RO

    Residual Reward Models for Preference-based Reinforcement Learning

    Authors: Chenyang Cao, Miguel Rogel-García, Mohamed Nabail, Xueqian Wang, Nicholas Rhinehart

    Abstract: Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from slow convergence speed since it requires training in a reward model. Prior work has proposed learning a reward model from demonstrations and fine-tuning it usin… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 26 pages, 22 figures