Compiler Optimization via LLM Reasoning for Efficient Model Serving

Tang, Sujun; Priebe, Christopher; Mahapatra, Rohan; Qin, Lianhui; Esmaeilzadeh, Hadi

Abstract:While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovation. Compiler optimizations have long driven substantial performance improvements, but existing compilers struggle with neural workloads due to the exponentially large and highly interdependent space of possible transformations. Although existing stochastic search techniques can be effective, they are often sample-inefficient and fail to leverage the structural context underlying compilation decisions. We set out to investigate the research question of whether reasoning with large language models (LLMs), without any retraining, can leverage the context-aware decision space of compiler optimization to significantly improve sample efficiency. To that end, we introduce a novel compilation framework (dubbed REASONING COMPILER) that formulates optimization as a sequential, context-aware decision process, guided by a large language model and structured Monte Carlo tree search (MCTS). The LLM acts as a proposal mechanism, suggesting hardware-aware transformations that reflect the current program state and accumulated performance feedback. Monte Carlo tree search (MCTS) incorporates the LLM-generated proposals to balance exploration and exploitation, facilitating structured, context-sensitive traversal of the expansive compiler optimization space. By achieving substantial speedups with markedly fewer samples than leading neural compilers, our approach demonstrates the potential of LLM-guided reasoning to transform the landscape of compiler optimization.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Cite as:	arXiv:2506.01374 [cs.LG]
	(or arXiv:2506.01374v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2506.01374

Computer Science > Machine Learning

Title:Compiler Optimization via LLM Reasoning for Efficient Model Serving

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators