PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health

Jin, Haoan; Chen, Siyuan; Wu, Mengyue; Zhu, Kenny Q.

Computer Science > Computation and Language

arXiv:2311.09189v1 (cs)

[Submitted on 15 Nov 2023 (this version), latest version 3 Jun 2024 (v2)]

Title:PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health

Authors:Haoan Jin, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu

View PDF

Abstract:Recently, there has been a growing interest in utilizing large language models (LLMs) in mental health research, with studies showcasing their remarkable capabilities, such as disease detection. However, there is currently a lack of a comprehensive benchmark for evaluating the capability of LLMs in this domain. Therefore, we address this gap by introducing the first comprehensive benchmark tailored to the unique characteristics of the mental health domain. This benchmark encompasses a total of six sub-tasks, covering three dimensions, to systematically assess the capabilities of LLMs in the realm of mental health. We have designed corresponding concise prompts for each sub-task. And we comprehensively evaluate a total of eight advanced LLMs using our benchmark. Experiment results not only demonstrate significant room for improvement in current LLMs concerning mental health but also unveil potential directions for future model optimization.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.09189 [cs.CL]
	(or arXiv:2311.09189v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.09189

Submission history

From: Haoan Jin [view email]
[v1] Wed, 15 Nov 2023 18:32:27 UTC (674 KB)
[v2] Mon, 3 Jun 2024 08:37:10 UTC (3,927 KB)

Computer Science > Computation and Language

Title:PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators