Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

Wójcik, Bartosz; Devoto, Alessio; Pustelnik, Karol; Minervini, Pasquale; Scardapane, Simone

Computer Science > Machine Learning

arXiv:2312.10193 (cs)

[Submitted on 15 Dec 2023 (v1), last revised 18 Dec 2024 (this version, v2)]

Title:Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

Authors:Bartosz Wójcik, Alessio Devoto, Karol Pustelnik, Pasquale Minervini, Simone Scardapane

View PDF HTML (experimental)

Abstract:While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also propose a distillation technique to replace any pre-trained model with an "ACMized" variant. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2312.10193 [cs.LG]
	(or arXiv:2312.10193v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.10193

Submission history

From: Bartosz Wójcik [view email]
[v1] Fri, 15 Dec 2023 20:39:43 UTC (11,398 KB)
[v2] Wed, 18 Dec 2024 17:13:41 UTC (11,057 KB)

Computer Science > Machine Learning

Title:Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators