A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Hong, Guan Zhe; Dikkala, Nishanth; Luo, Enming; Rashtchian, Cyrus; Wang, Xin; Panigrahy, Rina

Computer Science > Machine Learning

arXiv:2411.04105 (cs)

[Submitted on 6 Nov 2024 (v1), last revised 19 Jun 2025 (this version, v4)]

Title:A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Authors:Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, Rina Panigrahy

View PDF HTML (experimental)

Abstract:Due to the size and complexity of modern large language models (LLMs), it has proven challenging to uncover the underlying mechanisms that models use to solve reasoning problems. For instance, is their reasoning for a specific problem localized to certain parts of the network? Do they break down the reasoning problem into modular components that are then executed as sequential steps as we go deeper in the model? To better understand the reasoning capability of LLMs, we study a minimal propositional logic problem that requires combining multiple facts to arrive at a solution. By studying this problem on Mistral and Gemma models, up to 27B parameters, we illuminate the core components the models use to solve such logic problems. From a mechanistic interpretability point of view, we use causal mediation analysis to uncover the pathways and components of the LLMs' reasoning processes. Then, we offer fine-grained insights into the functions of attention heads in different layers. We not only find a sparse circuit that computes the answer, but we decompose it into sub-circuits that have four distinct and modular uses. Finally, we reveal that three distinct models -- Mistral-7B, Gemma-2-9B and Gemma-2-27B -- contain analogous but not identical mechanisms.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2411.04105 [cs.LG]
	(or arXiv:2411.04105v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.04105

Submission history

From: Guan Zhe Hong [view email]
[v1] Wed, 6 Nov 2024 18:35:32 UTC (3,268 KB)
[v2] Thu, 7 Nov 2024 03:50:19 UTC (3,268 KB)
[v3] Mon, 9 Dec 2024 16:36:34 UTC (5,576 KB)
[v4] Thu, 19 Jun 2025 20:14:18 UTC (5,309 KB)

Computer Science > Machine Learning

Title:A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators