CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors

Kadali, Sri Durga Sai Sowmya; Papalexakis, Evangelos E.

doi:10.1145/3746252.3760886

Computer Science > Computation and Language

arXiv:2508.02997 (cs)

[Submitted on 5 Aug 2025 (v1), last revised 27 Aug 2025 (this version, v3)]

Title:CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors

Authors:Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis

View PDF HTML (experimental)

Abstract:The widespread use of Large Language Models (LLMs) in many applications marks a significant advance in research and practice. However, their complexity and hard-to-understand nature make them vulnerable to attacks, especially jailbreaks designed to produce harmful responses. To counter these threats, developing strong detection methods is essential for the safe and reliable use of LLMs. This paper studies this detection problem using the Contextual Co-occurrence Matrix, a structure recognized for its efficacy in data-scarce environments. We propose a novel method leveraging the latent space characteristics of Contextual Co-occurrence Matrices and Tensors for the effective identification of adversarial and jailbreak prompts. Our evaluations show that this approach achieves a notable F1 score of 0.83 using only 0.5% of labeled prompts, which is a 96.6% improvement over baselines. This result highlights the strength of our learned patterns, especially when labeled data is scarce. Our method is also significantly faster, speedup ranging from 2.3 to 128.4 times compared to the baseline models.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2508.02997 [cs.CL]
	(or arXiv:2508.02997v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2508.02997
Related DOI:	https://doi.org/10.1145/3746252.3760886

Submission history

From: Sri Durga Sai Sowmya Kadali [view email]
[v1] Tue, 5 Aug 2025 01:53:32 UTC (508 KB)
[v2] Wed, 6 Aug 2025 00:30:43 UTC (3,889 KB)
[v3] Wed, 27 Aug 2025 21:24:30 UTC (515 KB)

Computer Science > Computation and Language

Title:CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators