CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs
Authors:
Dung Nguyen Manh,
Thang Phan Chau,
Nam Le Hai,
Thong T. Doan,
Nam V. Nguyen,
Quang Pham,
Nghi D. Q. Bui
Abstract:
Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this gap, we introduce CodeMMLU, a comprehensive multiple-choice benchmark designed to evaluate the depth of software and code comprehension in LLMs. CodeMMLU includes nearly 20,000 questions spanning dive…
▽ More
Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this gap, we introduce CodeMMLU, a comprehensive multiple-choice benchmark designed to evaluate the depth of software and code comprehension in LLMs. CodeMMLU includes nearly 20,000 questions spanning diverse domains, including code analysis, defect detection, and software engineering principles across multiple programming languages. Unlike traditional benchmarks that emphasize code generation, CodeMMLU assesses a model's ability to reason about programs across a wide-range of tasks such as code repair, execution reasoning, and fill-in-the-blank challenges. Our extensive evaluation reveals that even state-of-the-art models struggle with CodeMMLU, highlighting significant gaps in comprehension beyond generation. By emphasizing the essential connection between code understanding and effective AI-assisted development, CodeMMLU provides a critical resource for advancing more reliable and capable coding assistants.
△ Less
Submitted 9 April, 2025; v1 submitted 2 October, 2024;
originally announced October 2024.
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology
Authors:
Minh Huynh Nguyen,
Thang Phan Chau,
Phong X. Nguyen,
Nghi D. Q. Bui
Abstract:
Software agents have emerged as promising tools for addressing complex software engineering tasks. Existing works, on the other hand, frequently oversimplify software development workflows, despite the fact that such workflows are typically more complex in the real world. Thus, we propose AgileCoder, a multi agent system that integrates Agile Methodology (AM) into the framework. This system assign…
▽ More
Software agents have emerged as promising tools for addressing complex software engineering tasks. Existing works, on the other hand, frequently oversimplify software development workflows, despite the fact that such workflows are typically more complex in the real world. Thus, we propose AgileCoder, a multi agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles - such as Product Manager, Developer, and Tester to different agents, who then collaboratively develop software based on user inputs. AgileCoder enhances development efficiency by organizing work into sprints, focusing on incrementally developing software through sprints. Additionally, we introduce Dynamic Code Graph Generator, a module that creates a Code Dependency Graph dynamically as updates are made to the codebase. This allows agents to better comprehend the codebase, leading to more precise code generation and modifications throughout the software development process. AgileCoder surpasses existing benchmarks, like ChatDev and MetaGPT, establishing a new standard and showcasing the capabilities of multi agent systems in advanced software engineering environments.
△ Less
Submitted 14 July, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.