TrojanedCM: A Repository for Poisoned Neural Models of Source Code

Hussain, Aftab; Rabin, Md Rafiqul Islam; Alipour, Mohammad Amin

Computer Science > Software Engineering

arXiv:2311.14850v1 (cs)

[Submitted on 24 Nov 2023 (this version), latest version 11 Dec 2023 (v2)]

Title:TrojanedCM: A Repository for Poisoned Neural Models of Source Code

Authors:Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

View PDF

Abstract:With the rapid growth of research in trojaning deep neural models of source code, we observe that there is a need of developing a benchmark trojaned models for testing various trojan detection and unlearning techniques. In this work, we aim to provide the scientific community with a diverse pool of trojaned code models using which they can experiment with such techniques. We present \textsc{TrojanedCM}, a publicly available repository of clean and poisoned models of source code. We provide poisoned models for two code classification tasks (defect detection and clone detection) and a code generation task (text-to-code generation). We finetuned popular pretrained code models such as CodeBERT, PLBART, CodeT5, CodeT5+, on poisoned datasets that we generated from benchmark datasets (Devign, BigCloneBench, CONCODE) for the above mentioned tasks. The repository also provides full access to the architecture and weights of the models, allowing practitioners to investigate different white-box analysis techniques. In addition to the poisoned models, we also provide a poisoning framework using which practitioners can deploy various poisoning strategies for the different tasks and models of source code. All the material are accessible via this link: this https URL.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2311.14850 [cs.SE]
	(or arXiv:2311.14850v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2311.14850

Submission history

From: Aftab Hussain [view email]
[v1] Fri, 24 Nov 2023 21:58:06 UTC (59 KB)
[v2] Mon, 11 Dec 2023 20:07:35 UTC (1,160 KB)

Computer Science > Software Engineering

Title:TrojanedCM: A Repository for Poisoned Neural Models of Source Code

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:TrojanedCM: A Repository for Poisoned Neural Models of Source Code

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators