ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

Lai-Lopez, Nicole; Wang, Lusha; Yuan, Su; Zhang, Liza

Computer Science > Computation and Language

arXiv:2507.18769 (cs)

[Submitted on 24 Jul 2025]

Title:ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

Authors:Nicole Lai-Lopez, Lusha Wang, Su Yuan, Liza Zhang

View PDF HTML (experimental)

Abstract:In this work, we introduce our solution for the Multilingual Text Detoxification Task in the PAN-2025 competition for the ylmmcl team: a robust multilingual text detoxification pipeline that integrates lexicon-guided tagging, a fine-tuned sequence-to-sequence model (s-nlp/mt0-xl-detox-orpo) and an iterative classifier-based gatekeeping mechanism. Our approach departs from prior unsupervised or monolingual pipelines by leveraging explicit toxic word annotation via the multilingual_toxic_lexicon to guide detoxification with greater precision and cross-lingual generalization. Our final model achieves the highest STA (0.922) from our previous attempts, and an average official J score of 0.612 for toxic inputs in both the development and test sets. It also achieved xCOMET scores of 0.793 (dev) and 0.787 (test). This performance outperforms baseline and backtranslation methods across multiple languages, and shows strong generalization in high-resource settings (English, Russian, French). Despite some trade-offs in SIM, the model demonstrates consistent improvements in detoxification strength. In the competition, our team achieved ninth place with a score of 0.612.

Comments:	16 pages, 5 figures, 3 tables,
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2507.18769 [cs.CL]
	(or arXiv:2507.18769v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2507.18769

Submission history

From: Nicole Lai-Lopez [view email]
[v1] Thu, 24 Jul 2025 19:38:15 UTC (659 KB)

Computer Science > Computation and Language

Title:ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators