Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations

Shang, Xiuwei; Hu, Li; Cheng, Shaoyin; Chen, Guoqiang; Wu, Benlong; Zhang, Weiming; Yu, Nenghai

Computer Science > Software Engineering

arXiv:2410.18561 (cs)

[Submitted on 24 Oct 2024]

Title:Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations

Authors:Xiuwei Shang, Li Hu, Shaoyin Cheng, Guoqiang Chen, Benlong Wu, Weiming Zhang, Nenghai Yu

View PDF HTML (experimental)

Abstract:Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. As IoT devices proliferate and rapidly evolve, their highly heterogeneous hardware architectures and complex compilation settings, coupled with the demand for large-scale function retrieval in practical applications, put forward higher requirements for BCSD methods. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction, and integrates a pre-trained language model with a graph neural network to capture both semantic and structural information from different perspectives. By introducing momentum contrastive learning, it effectively enhances retrieval capabilities in large-scale candidate function sets, distinguishing between subtle function similarities and differences. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.

Comments:	13 pages, 10 figures
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2410.18561 [cs.SE]
	(or arXiv:2410.18561v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2410.18561

Submission history

From: Xiuwei Shang [view email]
[v1] Thu, 24 Oct 2024 09:09:20 UTC (929 KB)

Computer Science > Software Engineering

Title:Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators