Revisiting Process versus Product Metrics: a Large Scale Analysis

Majumder, Suvodeep; Mody, Pranav; Menzies, Tim

doi:10.1007/s10664-021-10068-4

Computer Science > Software Engineering

arXiv:2008.09569 (cs)

[Submitted on 21 Aug 2020 (v1), last revised 26 Oct 2021 (this version, v3)]

Title:Revisiting Process versus Product Metrics: a Large Scale Analysis

Authors:Suvodeep Majumder, Pranav Mody, Tim Menzies

View PDF

Abstract:Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)?
To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median values).
That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).

Comments:	36 pages, 12 figures and 5 tables
Subjects:	Software Engineering (cs.SE); Machine Learning (cs.LG)
Cite as:	arXiv:2008.09569 [cs.SE]
	(or arXiv:2008.09569v3 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2008.09569
Journal reference:	Empirical Software Engineering, Volume 27, Issue 3, May 2022
Related DOI:	https://doi.org/10.1007/s10664-021-10068-4

Submission history

From: Suvodeep Majumder [view email]
[v1] Fri, 21 Aug 2020 16:26:22 UTC (10,650 KB)
[v2] Tue, 20 Oct 2020 19:23:22 UTC (6,116 KB)
[v3] Tue, 26 Oct 2021 13:50:46 UTC (2,116 KB)

Computer Science > Software Engineering

Title:Revisiting Process versus Product Metrics: a Large Scale Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Revisiting Process versus Product Metrics: a Large Scale Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators