Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi

Deshmukh, Pranita; Kulkarni, Nikita; Kulkarni, Sanhita; Manghani, Kareena; Joshi, Raviraj

doi:10.1109/I2CT61223.2024.10543946

Computer Science > Computation and Language

arXiv:2408.03172 (cs)

[Submitted on 6 Aug 2024]

Title:Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi

Authors:Pranita Deshmukh, Nikita Kulkarni, Sanhita Kulkarni, Kareena Manghani, Raviraj Joshi

View PDF HTML (experimental)

Abstract:With the surge in digital content in low-resource languages, there is an escalating demand for advanced Natural Language Processing (NLP) techniques tailored to these languages. BERT (Bidirectional Encoder Representations from Transformers), serving as the foundational framework for numerous NLP architectures and language models, is increasingly employed for the development of low-resource NLP models. Parameter Efficient Fine-Tuning (PEFT) is a method for fine-tuning Large Language Models (LLMs) and reducing the training parameters to some extent to decrease the computational costs needed for training the model and achieve results comparable to a fully fine-tuned model. In this work, we present a study of PEFT methods for the Indic low-resource language Marathi. We conduct a comprehensive analysis of PEFT methods applied to various monolingual and multilingual Marathi BERT models. These approaches are evaluated on prominent text classification datasets like MahaSent, MahaHate, and MahaNews. The incorporation of PEFT techniques is demonstrated to significantly expedite the training speed of the models, addressing a critical aspect of model development and deployment. In this study, we explore Low-Rank Adaptation of Large Language Models (LoRA) and adapter methods for low-resource text classification. We show that these methods are competitive with full fine-tuning and can be used without loss in accuracy. This study contributes valuable insights into the effectiveness of Marathi BERT models, offering a foundation for the continued advancement of NLP capabilities in Marathi and similar Indic languages.

Comments:	Accepted at I2CT 2024
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2408.03172 [cs.CL]
	(or arXiv:2408.03172v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2408.03172
Related DOI:	https://doi.org/10.1109/I2CT61223.2024.10543946

Submission history

From: Raviraj Joshi [view email]
[v1] Tue, 6 Aug 2024 13:16:16 UTC (260 KB)

Computer Science > Computation and Language

Title:Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators