Skip to main content

Showing 1–1 of 1 results for author: Bensal, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24726  [pdf, ps, other

    cs.CL

    Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

    Authors: Shelly Bensal, Umar Jamil, Christopher Bryant, Melisa Russak, Kiran Kamble, Dmytro Mozolevskyi, Muayad Ali, Waseem AlShikh

    Abstract: We explore a method for improving the performance of large language models through self-reflection and reinforcement learning. By incentivizing the model to generate better self-reflections when it answers incorrectly, we demonstrate that a model's ability to solve complex, verifiable tasks can be enhanced even when generating synthetic data is infeasible and only binary feedback is available. Our… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.