Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

Gautam, Aayush; Shrestha, Susav; Reddy, Narasimha

Computer Science > Computation and Language

arXiv:2504.00030 (cs)

[Submitted on 28 Mar 2025 (v1), last revised 4 Jun 2025 (this version, v3)]

Title:Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

Authors:Aayush Gautam, Susav Shrestha, Narasimha Reddy

View PDF HTML (experimental)

Abstract:Speculative decoding accelerates large language model (LLM) inference by using a smaller draft model to propose tokens, which are then verified by a larger target model. However, selecting an optimal speculation length is critical for maximizing speedup while minimizing wasted computation. We introduce \textit{GammaTune} and \textit{GammaTune+}, training-free adaptive algorithms that dynamically adjust speculation length based on token acceptance rates using a heuristic-based switching mechanism. Evaluated on SpecBench across multiple tasks and model pairs, our method outperforms other heuristic-based approaches and fixed-length speculative decoding, achieving an average speedup of 15\% ($\pm$5\%) with \textit{GammaTune} and 16\% ($\pm$3\%) with \textit{GammaTune+}, while reducing performance variance. This makes \textit{GammaTune} a robust and efficient solution for real-world deployment.

Comments:	6 pages, 2 figures, 1 table
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2504.00030 [cs.CL]
	(or arXiv:2504.00030v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.00030

Submission history

From: Susav Shrestha [view email]
[v1] Fri, 28 Mar 2025 23:41:55 UTC (1,203 KB)
[v2] Thu, 3 Apr 2025 12:31:40 UTC (1,203 KB)
[v3] Wed, 4 Jun 2025 06:07:17 UTC (1,200 KB)

Computer Science > Computation and Language

Title:Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators