Coarse-Tuning Models of Code with Reinforcement Learning Feedback

Jain, Abhinav; Adiole, Chima; Chaudhuri, Swarat; Reps, Thomas; Jermaine, Chris

Computer Science > Programming Languages

arXiv:2305.18341 (cs)

[Submitted on 25 May 2023 (v1), last revised 23 Dec 2023 (this version, v2)]

Title:Coarse-Tuning Models of Code with Reinforcement Learning Feedback

Authors:Abhinav Jain (1), Chima Adiole (1), Swarat Chaudhuri (2), Thomas Reps (3), Chris Jermaine (1) ((1) Rice University, (2) UT Austin, (3) University of Wisconsin)

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) pre-trained on code have recently emerged as the dominant approach to program synthesis. However, these models are trained using next-token prediction, which ignores the syntax and semantics of code. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code. The grounding function uses (i) compiler-derived feedback on whether the code it generates passes a set of correctness checks; and (ii) feedback from a different LLM that compares the generated code to a reference code. RLCF is model- and language-agnostic. We empirically evaluate it on the MBJP and MathQA tasks for Java. Our experiments show that RLCF raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs.

Comments:	23 pages
Subjects:	Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2305.18341 [cs.PL]
	(or arXiv:2305.18341v2 [cs.PL] for this version)
	https://doi.org/10.48550/arXiv.2305.18341

Submission history

From: Abhinav Jain [view email]
[v1] Thu, 25 May 2023 22:09:08 UTC (2,834 KB)
[v2] Sat, 23 Dec 2023 20:00:52 UTC (6,065 KB)

Computer Science > Programming Languages

Title:Coarse-Tuning Models of Code with Reinforcement Learning Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Programming Languages

Title:Coarse-Tuning Models of Code with Reinforcement Learning Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators