A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Yan, Yibo; Su, Jiamin; He, Jianxiang; Fu, Fangteng; Zheng, Xu; Lyu, Yuanhuiyi; Wang, Kun; Wang, Shen; Wen, Qingsong; Hu, Xuming

Computer Science > Computation and Language

arXiv:2412.11936 (cs)

[Submitted on 16 Dec 2024 (v1), last revised 18 Feb 2025 (this version, v2)]

Title:A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Authors:Yibo Yan, Jiamin Su, Jianxiang He, Fangteng Fu, Xu Zheng, Yuanhuiyi Lyu, Kun Wang, Shen Wang, Qingsong Wen, Xuming Hu

View PDF

Abstract:Mathematical reasoning, a core aspect of human cognition, is vital across many domains, from educational problem-solving to scientific advancements. As artificial general intelligence (AGI) progresses, integrating large language models (LLMs) with mathematical reasoning tasks is becoming increasingly significant. This survey provides the first comprehensive analysis of mathematical reasoning in the era of multimodal large language models (MLLMs). We review over 200 studies published since 2021, and examine the state-of-the-art developments in Math-LLMs, with a focus on multimodal settings. We categorize the field into three dimensions: benchmarks, methodologies, and challenges. In particular, we explore multimodal mathematical reasoning pipeline, as well as the role of (M)LLMs and the associated methodologies. Finally, we identify five major challenges hindering the realization of AGI in this domain, offering insights into the future direction for enhancing multimodal reasoning capabilities. This survey serves as a critical resource for the research community in advancing the capabilities of LLMs to tackle complex multimodal reasoning tasks.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.11936 [cs.CL]
	(or arXiv:2412.11936v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.11936

Submission history

From: Yibo Yan [view email]
[v1] Mon, 16 Dec 2024 16:21:41 UTC (812 KB)
[v2] Tue, 18 Feb 2025 02:37:21 UTC (824 KB)

Computer Science > Computation and Language

Title:A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators