-
Whodunit: Classifying Code as Human Authored or GPT-4 Generated -- A case study on CodeChef problems
Authors:
Oseremen Joy Idialu,
Noble Saji Mathews,
Rungroj Maipradit,
Joanne M. Atlee,
Mei Nagappan
Abstract:
Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit…
▽ More
Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI generated code as their own work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code. Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4. Our classifier outperforms baselines, with an F1-score and AUC-ROC score of 0.91. A variant of our classifier that excludes gameable features (e.g., empty lines, whitespace) still performs well with an F1-score and AUC-ROC score of 0.89. We also evaluated our classifier with respect to the difficulty of the programming problem and found that there was almost no difference between easier and intermediate problems, and the classifier performed only slightly worse on harder problems. Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Studying the association between Gitcoin's issues and resolving outcomes
Authors:
Morakot Choetkiertikul,
Arada Puengmongkolchaikit,
Pandaree Chandra,
Chaiyong Ragkitwetsakul,
Rungroj Maipradit,
Hideaki Hata,
Thanwadee Sunetnanta,
Kenichi Matsumoto
Abstract:
The development of open-source software (OSS) projects usually have been driven through collaborations among contributors and strongly relies on volunteering. Thus, allocating software practitioners (e.g., contributors) to a particular task is non-trivial and draws attention away from the development. Therefore, a number of bug bounty platforms have emerged to address this problem through bounty r…
▽ More
The development of open-source software (OSS) projects usually have been driven through collaborations among contributors and strongly relies on volunteering. Thus, allocating software practitioners (e.g., contributors) to a particular task is non-trivial and draws attention away from the development. Therefore, a number of bug bounty platforms have emerged to address this problem through bounty rewards. Especially, Gitcoin, a new bounty platform, introduces a bounty reward mechanism that allows individual issue owners (backers) to define a reward value using cryptocurrencies rather than using crowdfunding mechanisms. Although a number of studies have investigated the phenomenon on bounty platforms, those rely on different bounty reward systems. Our study thus investigates the association between the Gitcoin bounties and their outcomes (i.e., success and non-success). We empirically study over 4,000 issues with Gitcoin bounties using statistical analysis and machine learning techniques. We also conducted a comparative study with the Bountysource platform to gain insights into the usage of both platforms. Our study highlights the importance of factors such as the length of the project, issue description, type of bounty issue, and the bounty value, which are found to be highly correlated with the outcome of bounty issues. These findings can provide useful guidance to practitioners.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Repeated Builds During Code Review: An Empirical Study of the OpenStack Community
Authors:
Rungroj Maipradit,
Dong Wang,
Patanamon Thongtanunam,
Raula Gaikovina Kula,
Yasutaka Kamei,
Shane McIntosh
Abstract:
Code review is a popular practice where developers critique each others' changes. Since automated builds can identify low-level issues (e.g., syntactic errors, regression bugs), it is not uncommon for software organizations to incorporate automated builds in the code review process. In such code review deployment scenarios, submitted change sets must be approved for integration by both peer code r…
▽ More
Code review is a popular practice where developers critique each others' changes. Since automated builds can identify low-level issues (e.g., syntactic errors, regression bugs), it is not uncommon for software organizations to incorporate automated builds in the code review process. In such code review deployment scenarios, submitted change sets must be approved for integration by both peer code reviewers and automated build bots. Since automated builds may produce an unreliable signal of the status of a change set (e.g., due to ``flaky'' or non-deterministic execution behaviour), code review tools, such as Gerrit, allow developers to request a ``recheck'', which repeats the build process without updating the change set. We conjecture that an unconstrained recheck command will waste time and resources if it is not applied judiciously. To explore how the recheck command is applied in a practical setting, in this paper, we conduct an empirical study of 66,932 code reviews from the OpenStack community.
We quantitatively analyze (i) how often build failures are rechecked; (ii) the extent to which invoking recheck changes build failure outcomes; and (iii) how much waste is generated by invoking recheck. We observe that (i) 55% of code reviews invoke the recheck command after a failing build is reported; (ii) invoking the recheck command only changes the outcome of a failing build in 42% of the cases; and (iii) invoking the recheck command increases review waiting time by an average of 2,200% and equates to 187.4 compute years of waste -- enough compute resources to compete with the oldest land living animal on earth.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
FixMe: A GitHub Bot for Detecting and Monitoring On-Hold Self-Admitted Technical Debt
Authors:
Saranphon Phaithoon,
Supakarn Wongnil,
Patiphol Pussawong,
Morakot Choetkiertikul,
Chaiyong Ragkhitwetsagul,
Thanwadee Sunetnanta,
Rungroj Maipradit,
Hideaki Hata,
Kenichi Matsumoto
Abstract:
Self-Admitted Technical Debt (SATD) is a special form of technical debt in which developers intentionally record their hacks in the code by adding comments for attention. Here, we focus on issue-related "On-hold SATD", where developers suspend proper implementation due to issues reported inside or outside the project. When the referenced issues are resolved, the On-hold SATD also need to be addres…
▽ More
Self-Admitted Technical Debt (SATD) is a special form of technical debt in which developers intentionally record their hacks in the code by adding comments for attention. Here, we focus on issue-related "On-hold SATD", where developers suspend proper implementation due to issues reported inside or outside the project. When the referenced issues are resolved, the On-hold SATD also need to be addressed, but since monitoring these issue reports takes a lot of time and effort, developers may not be aware of the resolved issues and leave the On-hold SATD in the code. In this paper, we propose FixMe, a GitHub bot that helps developers detecting and monitoring On-hold SATD in their repositories and notify them whenever the On-hold SATDs are ready to be fixed (i.e. the referenced issues are resolved). The bot can automatically detect On-hold SATD comments from source code using machine learning techniques and discover referenced issues. When the referenced issues are resolved, developers will be notified by FixMe bot. The evaluation conducted with 11 participants shows that our FixMe bot can support them in dealing with On-hold SATD. FixMe is available at https://www.fixmebot.app/ and FixMe's VDO is at https://youtu.be/YSz9kFxN_YQ.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Automated Identification of On-hold Self-admitted Technical Debt
Authors:
Rungroj Maipradit,
Bin Lin,
Csaba Nagy,
Gabriele Bavota,
Michele Lanza,
Hideaki Hata,
Kenichi Matsumoto
Abstract:
Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of "technical debt", a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SAT…
▽ More
Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of "technical debt", a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt). We focus on a specific type of SATD, namely "On-hold" SATD, in which developers document in their comments the need to halt an implementation task due to conditions outside of their scope of work (e.g., an open issue must be closed before a function can be implemented). We present an approach, based on regular expressions and machine learning, which is able to detect issues referenced in code comments, and to automatically classify the detected instances as either "On-hold" (the issue is referenced to indicate the need to wait for its resolution before completing a task), or as "cross-reference", (the issue is referenced to document the code, for example to explain the rationale behind an implementation choice). Our approach also mines the issue tracker of the projects to check if the On-hold SATD instances are "superfluous" and can be removed (i.e., the referenced issue has been closed, but the SATD is still in the code). Our evaluation confirms that our approach can indeed identify relevant instances of On-hold SATD. We illustrate its usefulness by identifying superfluous On-hold SATD instances in open source projects as confirmed by the original developers.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Sentiment Classification using N-gram IDF and Automated Machine Learning
Authors:
Rungroj Maipradit,
Hideaki Hata,
Kenichi Matsumoto
Abstract:
We propose a sentiment classification method with a general machine learning framework. For feature representation, n-gram IDF is used to extract software-engineering-related, dataset-specific, positive, neutral, and negative n-gram expressions. For classifiers, an automated machine learning tool is used. In the comparison using publicly available datasets, our method achieved the highest F1 value…
▽ More
We propose a sentiment classification method with a general machine learning framework. For feature representation, n-gram IDF is used to extract software-engineering-related, dataset-specific, positive, neutral, and negative n-gram expressions. For classifiers, an automated machine learning tool is used. In the comparison using publicly available datasets, our method achieved the highest F1 values in positive and negative sentences on all datasets.
△ Less
Submitted 25 May, 2019; v1 submitted 27 April, 2019;
originally announced April 2019.
-
Wait For It: Identifying "On-Hold" Self-Admitted Technical Debt
Authors:
Rungroj Maipradit,
Christoph Treude,
Hideaki Hata,
Kenichi Matsumoto
Abstract:
Self-admitted technical debt refers to situations where a software developer knows that their current implementation is not optimal and indicates this using a source code comment. In this work, we hypothesize that it is possible to develop automated techniques to understand a subset of these comments in more detail, and to propose tool support that can help developers manage self-admitted technica…
▽ More
Self-admitted technical debt refers to situations where a software developer knows that their current implementation is not optimal and indicates this using a source code comment. In this work, we hypothesize that it is possible to develop automated techniques to understand a subset of these comments in more detail, and to propose tool support that can help developers manage self-admitted technical debt more effectively. Based on a qualitative study of 335 comments indicating self-admitted technical debt, we first identify one particular class of debt amenable to automated management: "on-hold" self-admitted technical debt, i.e., debt which contains a condition to indicate that a developer is waiting for a certain event or an updated functionality having been implemented elsewhere. We then design and evaluate an automated classifier which can identify these "on-hold" instances with an area under the receiver operating characteristic curve (AUC) of 0.83 as well as detect the specific conditions that developers are waiting for. Our work presents a first step towards automated tool support that is able to indicate when certain instances of self-admitted technical debt are ready to be addressed.
△ Less
Submitted 21 October, 2019; v1 submitted 27 January, 2019;
originally announced January 2019.