Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Agarwal, Aradhye; Ramesh, Suhas K; Sengupta, Ayan; Chakraborty, Tanmoy

Computer Science > Computation and Language

arXiv:2408.14470 (cs)

[Submitted on 26 Aug 2024 (v1), last revised 23 Jun 2025 (this version, v3)]

Title:Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Authors:Aradhye Agarwal, Suhas K Ramesh, Ayan Sengupta, Tanmoy Chakraborty

View PDF

Abstract:Fine-tuning large language models (LLMs) on downstream tasks requires substantial computational resources. Selective PEFT, a class of parameter-efficient fine-tuning (PEFT) methodologies, aims to mitigate these computational challenges by selectively fine-tuning only a small fraction of the model parameters. Although parameter-efficient, these techniques often fail to match the performance of fully fine-tuned models, primarily due to inherent biases introduced during parameter selection. Traditional selective PEFT techniques use a fixed set of parameters selected using different importance heuristics, failing to capture parameter importance dynamically and often leading to suboptimal performance. We introduce $\text{ID}^3$, a novel selective PEFT method that calculates parameter importance continually, and dynamically unmasks parameters by balancing exploration and exploitation in parameter selection. Our empirical study on 16 tasks spanning natural language understanding, mathematical reasoning and summarization demonstrates the effectiveness of our method compared to fixed-masking selective PEFT techniques. We analytically show that $\text{ID}^3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency. Since $\text{ID}^3$ is robust to random initialization of neurons and operates directly on the optimization process, it is highly flexible and can be integrated with existing additive and reparametrization-based PEFT techniques such as adapters and LoRA respectively.

Comments:	15 pages, 7 tables, 9 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2408.14470 [cs.CL]
	(or arXiv:2408.14470v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2408.14470

Submission history

From: Aradhye Agarwal [view email]
[v1] Mon, 26 Aug 2024 17:58:53 UTC (493 KB)
[v2] Tue, 27 Aug 2024 03:56:11 UTC (493 KB)
[v3] Mon, 23 Jun 2025 16:25:27 UTC (169 KB)

Computer Science > Computation and Language

Title:Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators