PromptShield: Deployable Detection for Prompt Injection Attacks
Authors:
Dennis Jacob,
Hend Alzahrani,
Zhanhao Hu,
Basel Alomair,
David Wagner
Abstract:
Application designers have moved to integrate large language models (LLMs) into their products. However, many LLM-integrated applications are vulnerable to prompt injections. While attempts have been made to address this problem by building prompt injection detectors, many are not yet suitable for practical deployment. To support research in this area, we introduce PromptShield, a benchmark for tr…
▽ More
Application designers have moved to integrate large language models (LLMs) into their products. However, many LLM-integrated applications are vulnerable to prompt injections. While attempts have been made to address this problem by building prompt injection detectors, many are not yet suitable for practical deployment. To support research in this area, we introduce PromptShield, a benchmark for training and evaluating deployable prompt injection detectors. Our benchmark is carefully curated and includes both conversational and application-structured data. In addition, we use insights from our curation process to fine-tune a new prompt injection detector that achieves significantly higher performance in the low false positive rate (FPR) evaluation regime compared to prior schemes. Our work suggests that careful curation of training data and larger models can contribute to strong detector performance.
△ Less
Submitted 11 April, 2025; v1 submitted 25 January, 2025;
originally announced January 2025.
Can LLMs Ask Good Questions?
Authors:
Yueheng Zhang,
Xiaoyuan Liu,
Yiyou Sun,
Atheer Alharbi,
Hend Alzahrani,
Tianneng Shi,
Basel Alomair,
Dawn Song
Abstract:
We evaluate questions generated by large language models (LLMs) from context, comparing them to human-authored questions across six dimensions: question type, question length, context coverage, answerability, uncommonness, and required answer length. Our study spans two open-source and two proprietary state-of-the-art models. Results reveal that LLM-generated questions tend to demand longer descri…
▽ More
We evaluate questions generated by large language models (LLMs) from context, comparing them to human-authored questions across six dimensions: question type, question length, context coverage, answerability, uncommonness, and required answer length. Our study spans two open-source and two proprietary state-of-the-art models. Results reveal that LLM-generated questions tend to demand longer descriptive answers and exhibit more evenly distributed context focus, in contrast to the positional bias often seen in QA tasks. These findings provide insights into the distinctive characteristics of LLM-generated questions and inform future work on question quality and downstream applications.
△ Less
Submitted 17 June, 2025; v1 submitted 6 January, 2025;
originally announced January 2025.