Skip to main content

Showing 1–1 of 1 results for author: Tontchev, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.06674  [pdf, other

    cs.CL cs.AI

    Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

    Authors: Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, Madian Khabsa

    Abstract: We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.