Skip to main content

Showing 1–6 of 6 results for author: Maloyan, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.13348  [pdf, ps, other

    cs.CL

    Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks

    Authors: Narek Maloyan, Bislan Ashinov, Dmitry Namiot

    Abstract: Large Language Models (LLMs) are increasingly employed as evaluators (LLM-as-a-Judge) for assessing the quality of machine-generated text. This paradigm offers scalability and cost-effectiveness compared to human annotation. However, the reliability and security of such systems, particularly their robustness against adversarial manipulations, remain critical concerns. This paper investigates the v… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  2. arXiv:2504.18333  [pdf, ps, other

    cs.CR cs.CL

    Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections

    Authors: Narek Maloyan, Dmitry Namiot

    Abstract: LLM as judge systems used to assess text quality code correctness and argument strength are vulnerable to prompt injection attacks. We introduce a framework that separates content author attacks from system prompt attacks and evaluate five models Gemma 3.27B Gemma 3.4B Llama 3.2 3B GPT 4 and Claude 3 Opus on four tasks with various defenses using fifty prompts per condition. Attacks achieved up to… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  3. Prompt Injection Attacks in Defended Systems

    Authors: Daniil Khomsky, Narek Maloyan, Bulat Nutfullin

    Abstract: Large language models play a crucial role in modern natural language processing technologies. However, their extensive use also introduces potential security risks, such as the possibility of black-box attacks. These attacks can embed hidden malicious features into the model, leading to adverse consequences during its deployment. This paper investigates methods for black-box attacks on large lan… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2404.13660  [pdf, other

    cs.CL

    Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge

    Authors: Narek Maloyan, Ekansh Verma, Bulat Nutfullin, Bislan Ashinov

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains, but their vulnerability to trojan or backdoor attacks poses significant security risks. This paper explores the challenges and insights gained from the Trojan Detection Competition 2023 (TDC2023), which focused on identifying and evaluating trojan attacks on LLMs. We investigate the difficulty of distinguish… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  5. DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning

    Authors: Daniil Homskiy, Narek Maloyan

    Abstract: In recent years, sentiment analysis has gained significant importance in natural language processing. However, most existing models and datasets for sentiment analysis are developed for high-resource languages, such as English and Chinese, leaving low-resource languages, particularly African languages, largely unexplored. The AfriSenti-SemEval 2023 Shared Task 12 aims to fill this gap by evaluatin… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  6. DIALOG-22 RuATD Generated Text Detection

    Authors: Narek Maloyan, Bulat Nutfullin, Eugene Ilyushin

    Abstract: Text Generation Models (TGMs) succeed in creating text that matches human language style reasonably well. Detectors that can distinguish between TGM-generated text and human-written ones play an important role in preventing abuse of TGM. In this paper, we describe our pipeline for the two DIALOG-22 RuATD tasks: detecting generated text (binary task) and classification of which model was used to… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: 6 pages