STEPS: A Benchmark for Order Reasoning in Sequential Tasks

Wang, Weizhi; Wang, Hong; Yan, Xifeng

Computer Science > Computation and Language

arXiv:2306.04441 (cs)

[Submitted on 7 Jun 2023]

Title:STEPS: A Benchmark for Order Reasoning in Sequential Tasks

Authors:Weizhi Wang, Hong Wang, Xifeng Yan

View PDF

Abstract:Various human activities can be abstracted into a sequence of actions in natural text, i.e. cooking, repairing, manufacturing, etc. Such action sequences heavily depend on the executing order, while disorder in action sequences leads to failure of further task execution by robots or AI agents. Therefore, to verify the order reasoning capability of current neural models in sequential tasks, we propose a challenging benchmark , named STEPS. STEPS involves two subtask settings, focusing on determining the rationality of given next step in recipes and selecting the reasonable step from the multi-choice question, respectively. We describe the data construction and task formulations, and benchmark most of significant Large Language Models (LLMs). The experimental results demonstrate 1) The commonsense reasoning of action orders in sequential tasks are challenging to resolve via zero-shot prompting or few-shot in-context learning for LLMs; 2) Prompting method still significantly lags behind tuning-based method on STEPS.

Comments:	Work in Progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.04441 [cs.CL]
	(or arXiv:2306.04441v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.04441

Submission history

From: Weizhi Wang [view email]
[v1] Wed, 7 Jun 2023 13:58:55 UTC (2,815 KB)

Computer Science > Computation and Language

Title:STEPS: A Benchmark for Order Reasoning in Sequential Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:STEPS: A Benchmark for Order Reasoning in Sequential Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators