Showing 1–2 of 2 results for author: Dhamaskar, M A

Search v0.5.6 released 2020-02-24

arXiv:2502.04245 [pdf]

cs.CL cs.AI cs.LG

TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi

Authors: Mohammed Amaan Dhamaskar, Rasika Ransing

Abstract: India's rich cultural and linguistic diversity poses various challenges in the domain of Natural Language Processing (NLP), particularly in Named Entity Recognition (NER). NER is a NLP task that aims to identify and classify tokens into different entity groups like Person, Location, Organization, Number, etc. This makes NER very useful for downstream tasks like context-aware anonymization. This pa… ▽ More India's rich cultural and linguistic diversity poses various challenges in the domain of Natural Language Processing (NLP), particularly in Named Entity Recognition (NER). NER is a NLP task that aims to identify and classify tokens into different entity groups like Person, Location, Organization, Number, etc. This makes NER very useful for downstream tasks like context-aware anonymization. This paper details our work to build a multilingual NER model for the three most spoken languages in India - Hindi, Bengali & Marathi. We train a custom transformer model and fine tune a few pretrained models, achieving an F1 Score of 92.11 for a total of 6 entity groups. Through this paper, we aim to introduce a single model to perform NER and significantly reduce the inconsistencies in entity groups and tag names, across the three languages. △ Less

Submitted 6 February, 2025; originally announced February 2025.
arXiv:2412.18163 [pdf]

cs.CL cs.AI

Survey of Pseudonymization, Abstractive Summarization & Spell Checker for Hindi and Marathi

Authors: Rasika Ransing, Mohammed Amaan Dhamaskar, Ayush Rajpurohit, Amey Dhoke, Sanket Dalvi

Abstract: India's vast linguistic diversity presents unique challenges and opportunities for technological advancement, especially in the realm of Natural Language Processing (NLP). While there has been significant progress in NLP applications for widely spoken languages, the regional languages of India, such as Marathi and Hindi, remain underserved. Research in the field of NLP for Indian regional language… ▽ More India's vast linguistic diversity presents unique challenges and opportunities for technological advancement, especially in the realm of Natural Language Processing (NLP). While there has been significant progress in NLP applications for widely spoken languages, the regional languages of India, such as Marathi and Hindi, remain underserved. Research in the field of NLP for Indian regional languages is at a formative stage and holds immense significance. The paper aims to build a platform which enables the user to use various features like text anonymization, abstractive text summarization and spell checking in English, Hindi and Marathi language. The aim of these tools is to serve enterprise and consumer clients who predominantly use Indian Regional Languages. △ Less

Submitted 23 December, 2024; originally announced December 2024.

Search v0.5.6 released 2020-02-24