Problems with SZZ and Features: An empirical study of the state of practice of defect prediction data collection

Herbold, Steffen; Trautsch, Alexander; Trautsch, Fabian; Ledel, Benjamin

doi:10.1007/s10664-021-10092-4

Computer Science > Software Engineering

arXiv:1911.08938 (cs)

[Submitted on 20 Nov 2019 (v1), last revised 11 Nov 2021 (this version, v3)]

Title:Problems with SZZ and Features: An empirical study of the state of practice of defect prediction data collection

Authors:Steffen Herbold, Alexander Trautsch, Fabian Trautsch, Benjamin Ledel

View PDF

Abstract:Context: The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features are also important.
Objective: We provide an empirical analysis of the defect labels created with the SZZ algorithm and the impact of commonly used features on results.
Method: We used a combination of manual validation and adopted or improved heuristics for the collection of defect data. We conducted an empirical study on 398 releases of 38 Apache projects.
Results: We found that only half of the bug fixing commits determined by SZZ are actually bug fixing. If a six-month time frame is used in combination with SZZ to determine which bugs affect a release, one file is incorrectly labeled as defective for every file that is correctly labeled as defective. In addition, two defective files are missed. We also explored the impact of the relatively small set of features that are available in most defect prediction data sets, as there are multiple publications that indicate that, e.g., churn related features are important for defect prediction. We found that the difference of using more features is not significant.
Conclusion: Problems with inaccurate defect labels are a severe threat to the validity of the state of the art of defect prediction. Small feature sets seem to be a less severe threat.

Comments:	Accepted at Empirical Software Engineering, Springer. First three authors are equally contributing
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:1911.08938 [cs.SE]
	(or arXiv:1911.08938v3 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.1911.08938
Related DOI:	https://doi.org/10.1007/s10664-021-10092-4

Submission history

From: Steffen Herbold [view email]
[v1] Wed, 20 Nov 2019 14:41:21 UTC (145 KB)
[v2] Fri, 14 Feb 2020 13:35:39 UTC (297 KB)
[v3] Thu, 11 Nov 2021 11:07:05 UTC (1,259 KB)

Computer Science > Software Engineering

Title:Problems with SZZ and Features: An empirical study of the state of practice of defect prediction data collection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Problems with SZZ and Features: An empirical study of the state of practice of defect prediction data collection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators