Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Madhusudhan, Nishanth; Madhusudhan, Sathwik Tejaswi; Yadav, Vikas; Hashemi, Masoud

Computer Science > Computation and Language

arXiv:2407.16221 (cs)

[Submitted on 23 Jul 2024 (v1), last revised 24 Sep 2024 (this version, v2)]

Title:Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Authors:Nishanth Madhusudhan, Sathwik Tejaswi Madhusudhan, Vikas Yadav, Masoud Hashemi

View PDF HTML (experimental)

Abstract:Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability, referring to an LLM's capability to withhold responses when uncertain or lacking a definitive answer, without compromising performance. Although previous studies have attempted to improve AA, they lack a standardised evaluation method and remain unsuitable for black-box models where token prediction probabilities are inaccessible. This makes comparative analysis challenging, especially for state-of-the-art closed-source commercial LLMs. This paper bridges this gap by introducing a black-box evaluation approach and a new dataset, Abstain-QA, crafted to rigorously assess AA across varied question types (answerable and unanswerable), domains (well-represented and under-represented), and task types (fact centric and reasoning). We also propose a new confusion matrix, the ''Answerable-Unanswerable Confusion Matrix'' (AUCM) which serves as the basis for evaluating AA, by offering a structured and precise approach for assessment. Finally, we explore the impact of three prompting strategies-Strict Prompting, Verbal Confidence Thresholding, and Chain-of-Thought (CoT)-on improving AA. Our results indicate that even powerful models like GPT-4, Mixtral 8x22b encounter difficulties with abstention; however, strategic approaches such as Strict prompting and CoT can enhance this capability.

Comments:	8 pages (excluding limitations, references and appendix) and 5 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.16221 [cs.CL]
	(or arXiv:2407.16221v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.16221

Submission history

From: Nishanth Madhusudhan Mr [view email]
[v1] Tue, 23 Jul 2024 06:56:54 UTC (7,235 KB)
[v2] Tue, 24 Sep 2024 14:25:58 UTC (811 KB)

Computer Science > Computation and Language

Title:Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators