Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Saparov, Abulhair; Pang, Richard Yuanzhe; Padmakumar, Vishakh; Joshi, Nitish; Kazemi, Seyed Mehran; Kim, Najoung; He, He

Computer Science > Computation and Language

arXiv:2305.15269 (cs)

[Submitted on 24 May 2023 (v1), last revised 3 Nov 2023 (this version, v3)]

Title:Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Authors:Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim, He He

View PDF

Abstract:Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, and from the same distribution as the in-context examples. To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. Our experiments on four LLMs of various sizes and training objectives show that they are able to generalize to compositional proofs. However, they have difficulty generalizing to longer proofs, and they require explicit demonstrations to produce hypothetical subproofs, specifically in proof by cases and proof by contradiction.

Comments:	Published as a conference paper at NeurIPS 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.15269 [cs.CL]
	(or arXiv:2305.15269v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.15269

Submission history

From: Abulhair Saparov [view email]
[v1] Wed, 24 May 2023 15:55:51 UTC (1,159 KB)
[v2] Fri, 20 Oct 2023 16:34:01 UTC (1,078 KB)
[v3] Fri, 3 Nov 2023 18:45:56 UTC (1,051 KB)

Computer Science > Computation and Language

Title:Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators