On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

Swarna, Karthik Chandra; Mathews, Noble Saji; Vagavolu, Dheeraj; Chimalakonda, Sridhar

Computer Science > Software Engineering

arXiv:2106.10918 (cs)

[Submitted on 21 Jun 2021 (v1), last revised 24 Dec 2023 (this version, v5)]

Title:On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

Authors:Karthik Chandra Swarna, Noble Saji Mathews, Dheeraj Vagavolu, Sridhar Chimalakonda

View PDF HTML (experimental)

Abstract:Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs such as Control Flow Graph (CFG) and Program Dependency Graph (PDG), which contain information about source code that AST does not. Even though some works tried to utilize multiple representations, they do not provide any insights about the costs and benefits of using multiple representations. The primary goal of this paper is to discuss the implications of utilizing multiple code representations, specifically AST, CFG, and PDG. We modify an AST path-based approach to accept multiple representations as input to an attention-based model. We do this to measure the impact of additional representations (such as CFG and PDG) over AST. We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection. Our approach increases the performance on these tasks by 11% (F1), 15.7% (Accuracy), and 9.3% (F1), respectively, over the baseline. In addition to the effect on performance, we discuss timing overheads incurred with multiple representations. We envision this work providing researchers with a lens to evaluate combinations of code representations for various tasks.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2106.10918 [cs.SE]
	(or arXiv:2106.10918v5 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2106.10918

Submission history

From: Karthik Chandra Swarna [view email]
[v1] Mon, 21 Jun 2021 08:36:38 UTC (2,723 KB)
[v2] Fri, 18 Mar 2022 05:01:54 UTC (1,857 KB)
[v3] Sun, 26 Mar 2023 19:36:57 UTC (4,311 KB)
[v4] Sat, 1 Apr 2023 21:07:02 UTC (9,011 KB)
[v5] Sun, 24 Dec 2023 17:24:51 UTC (5,068 KB)

Computer Science > Software Engineering

Title:On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators