On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?

Zhuo, Anyuan; Ning, Xuefei; Li, Ningyuan; Wang, Yu; Lu, Pinyan

Computer Science > Computation and Language

arXiv:2510.14365 (cs)

[Submitted on 16 Oct 2025 (v1), last revised 17 Oct 2025 (this version, v2)]

Title:On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?

Authors:Anyuan Zhuo, Xuefei Ning, Ningyuan Li, Yu Wang, Pinyan Lu

View PDF

Abstract:This work investigates the resilience of contemporary LLMs against frequent and structured character-level perturbations, specifically through the insertion of noisy characters after each input character. We introduce UCC-Inj, a practical method that inserts invisible Unicode control characters into text to discourage LLM misuse in scenarios such as online exam systems. Surprisingly, despite strong obfuscation that fragments tokenization and reduces the signal-to-noise ratio significantly, many LLMs still maintain notable performance. Through comprehensive evaluation across model-, problem-, and noise-related configurations, we examine the extent and mechanisms of this robustness, exploring both the handling of character-level tokenization and implicit versus explicit denoising mechanism hypotheses of character-level noises. We hope our findings on the low-level robustness of LLMs will shed light on the risks of their misuse and on the reliability of deploying LLMs across diverse applications.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2510.14365 [cs.CL]
	(or arXiv:2510.14365v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2510.14365

Submission history

From: Anyuan Zhuo [view email]
[v1] Thu, 16 Oct 2025 06:59:58 UTC (317 KB)
[v2] Fri, 17 Oct 2025 08:48:25 UTC (309 KB)

Computer Science > Computation and Language

Title:On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators