-
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Authors:
Shaharukh Khan,
Ayush Tarun,
Ali Faraz,
Palash Kamble,
Vivek Dahiya,
Praveen Pokala,
Ashish Kulkarni,
Chandra Khatri,
Abhinav Ravi,
Shubham Agarwal
Abstract:
In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token e…
▽ More
In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token embeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text only and Multimodal translation tasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of them on the Challenge set while remaining competitive for the other languages in the shared task.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Chitrarth: Bridging Vision and Language for a Billion People
Authors:
Shaharukh Khan,
Ayush Tarun,
Abhinav Ravi,
Ali Faraz,
Akshat Patidar,
Praveen Kumar Pokala,
Anagha Bhangare,
Raja Kolla,
Chandra Khatri,
Shubham Agarwal
Abstract:
Recent multimodal foundation models are primarily trained on English or high resource European language data, which hinders their applicability to other medium and low-resource languages. To address this limitation, we introduce Chitrarth (Chitra: Image; Artha: Meaning), an inclusive Vision-Language Model (VLM), specifically targeting the rich linguistic diversity and visual reasoning across 10 pr…
▽ More
Recent multimodal foundation models are primarily trained on English or high resource European language data, which hinders their applicability to other medium and low-resource languages. To address this limitation, we introduce Chitrarth (Chitra: Image; Artha: Meaning), an inclusive Vision-Language Model (VLM), specifically targeting the rich linguistic diversity and visual reasoning across 10 prominent Indian languages. Our model effectively integrates a state-of-the-art (SOTA) multilingual Large Language Model (LLM) with a vision module, primarily trained on multilingual image-text data. Furthermore, we also introduce BharatBench, a comprehensive framework for evaluating VLMs across various Indian languages, ultimately contributing to more diverse and effective AI systems. Our model achieves SOTA results for benchmarks across low resource languages while retaining its efficiency in English. Through our research, we aim to set new benchmarks in multilingual-multimodal capabilities, offering substantial improvements over existing models and establishing a foundation to facilitate future advancements in this arena.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Improvement of Heatbath Algorithm in LFT using Generative models
Authors:
Ali Faraz,
Ankur Singha,
Dipankar Chakrabarti,
Shinichi Nakajima,
Vipul Arora
Abstract:
The Heatbath Algorithm is commonly used for sampling in local lattice field theories, but performing exact updates or sampling from the local density is challenging when dealing with continuous variables. Heatbath methods rely on rejection-based sampling at each site, which can suffer from low acceptance rates if the proposal distribution is not optimally chosen a nontrivial task. In this work, we…
▽ More
The Heatbath Algorithm is commonly used for sampling in local lattice field theories, but performing exact updates or sampling from the local density is challenging when dealing with continuous variables. Heatbath methods rely on rejection-based sampling at each site, which can suffer from low acceptance rates if the proposal distribution is not optimally chosen a nontrivial task. In this work, we propose a novel, straightforward approach for generating proposals at each lattice site for the phi4 and XY models using generative AI models. This method learns a conditional local distribution, without requiring training samples from the target, conditioned on both neighboring sites and action parameter values.
△ Less
Submitted 28 January, 2025; v1 submitted 16 August, 2023;
originally announced August 2023.