OCEAN: Open-World Contrastive Authorship Identification

Mächtle, Felix; Serr, Jan-Niclas; Loose, Nils; Sander, Jonas; Eisenbarth, Thomas

Computer Science > Artificial Intelligence

arXiv:2412.05049 (cs)

[Submitted on 6 Dec 2024]

Title:OCEAN: Open-World Contrastive Authorship Identification

Authors:Felix Mächtle, Jan-Niclas Serr, Nils Loose, Jonas Sander, Thomas Eisenbarth

View PDF HTML (experimental)

Abstract:In an era where cyberattacks increasingly target the software supply chain, the ability to accurately attribute code authorship in binary files is critical to improving cybersecurity measures. We propose OCEAN, a contrastive learning-based system for function-level authorship attribution. OCEAN is the first framework to explore code authorship attribution on compiled binaries in an open-world and extreme scenario, where two code samples from unknown authors are compared to determine if they are developed by the same author. To evaluate OCEAN, we introduce new realistic datasets: CONAN, to improve the performance of authorship attribution systems in real-world use cases, and SNOOPY, to increase the robustness of the evaluation of such systems. We use CONAN to train our model and evaluate on SNOOPY, a fully unseen dataset, resulting in an AUROC score of 0.86 even when using high compiler optimizations. We further show that CONAN improves performance by 7% compared to the previously used Google Code Jam dataset. Additionally, OCEAN outperforms previous methods in their settings, achieving a 10% improvement over state-of-the-art SCS-Gan in scenarios analyzing source code. Furthermore, OCEAN can detect code injections from an unknown author in a software update, underscoring its value for securing software supply chains.

Comments:	To be published in Accepted at Applied Cryptography and Network Security (ACNS) 2025
Subjects:	Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2412.05049 [cs.AI]
	(or arXiv:2412.05049v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2412.05049

Submission history

From: Felix Maechtle [view email]
[v1] Fri, 6 Dec 2024 14:02:51 UTC (435 KB)

Computer Science > Artificial Intelligence

Title:OCEAN: Open-World Contrastive Authorship Identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:OCEAN: Open-World Contrastive Authorship Identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators