CodeT: Code Generation with Generated Tests

Chen, Bei; Zhang, Fengji; Nguyen, Anh; Zan, Daoguang; Lin, Zeqi; Lou, Jian-Guang; Chen, Weizhu

Computer Science > Computation and Language

arXiv:2207.10397v1 (cs)

[Submitted on 21 Jul 2022 (this version), latest version 23 Nov 2022 (v2)]

Title:CodeT: Code Generation with Generated Tests

Authors:Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen

View PDF

Abstract:Given a programming problem, pre-trained language models such as Codex have demonstrated the ability to generate multiple different code solutions via sampling. However, selecting a correct or best solution from those samples still remains a challenge. While an easy way to verify the correctness of a code solution is through executing test cases, producing high-quality test cases is prohibitively expensive. In this paper, we explore the use of pre-trained language models to automatically generate test cases, calling our method CodeT: Code generation with generated Tests. CodeT executes the code solutions using the generated test cases, and then chooses the best solution based on a dual execution agreement with both the generated test cases and other generated solutions. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks. Extensive experimental results demonstrate CodeT can achieve significant, consistent, and surprising improvements over previous methods. For example, CodeT improves the pass@1 on HumanEval to 65.8%, an increase of absolute 18.8% on the code-davinci-002 model, and an absolute 20+% improvement over previous state-of-the-art results.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Software Engineering (cs.SE)
Cite as:	arXiv:2207.10397 [cs.CL]
	(or arXiv:2207.10397v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2207.10397

Submission history

From: Bei Chen [view email]
[v1] Thu, 21 Jul 2022 10:18:37 UTC (483 KB)
[v2] Wed, 23 Nov 2022 07:42:10 UTC (2,209 KB)

Computer Science > Computation and Language

Title:CodeT: Code Generation with Generated Tests

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CodeT: Code Generation with Generated Tests

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators