Skip to main content

Showing 1–11 of 11 results for author: Das, R J

.
  1. arXiv:2504.06011  [pdf, other

    cs.CL

    Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi

    Authors: Monojit Choudhury, Shivam Chauhan, Rocktim Jyoti Das, Dhruv Sahnan, Xudong Han, Haonan Li, Aaryamonvikram Singh, Alok Anil Jadhav, Utkarsh Agarwal, Mukund Choudhary, Debopriyo Banerjee, Fajri Koto, Junaid Bhat, Awantika Shukla, Samujjwal Ghosh, Samta Kamboj, Onkar Pandit, Lalit Pradhan, Rahul Pal, Sunil Sahu, Soundar Doraiswamy, Parvez Mullah, Ali El Filali, Neha Sengupta, Gokul Ramakrishnan , et al. (5 additional authors not shown)

    Abstract: Developing high-quality large language models (LLMs) for moderately resourced languages presents unique challenges in data availability, model adaptation, and evaluation. We introduce Llama-3-Nanda-10B-Chat, or Nanda for short, a state-of-the-art Hindi-centric instruction-tuned generative LLM, designed to push the boundaries of open-source Hindi language models. Built upon Llama-3-8B, Nanda incorp… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  2. arXiv:2412.01928  [pdf, other

    cs.LG cs.AI

    MALT: Improving Reasoning with Multi-Agent LLM Training

    Authors: Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Rafael Rafailov, Ivan Laptev, Philip H. S. Torr, Fabio Pizzati, Ronald Clark, Christian Schroeder de Witt

    Abstract: Large Language Models (LLMs) often produce answers with a single chain-of-thought, which restricts their ability to explore reasoning paths or self-correct flawed outputs in complex tasks. In this paper, we introduce MALT (Multi-Agent LLM Training), a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps using a sequential pipeline of h… ▽ More

    Submitted 27 February, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  3. arXiv:2411.17636  [pdf, other

    cs.RO cs.AI

    MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

    Authors: Harsh Singh, Rocktim Jyoti Das, Mingfei Han, Preslav Nakov, Ivan Laptev

    Abstract: Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans i… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 48 pages

  4. arXiv:2410.14204  [pdf, other

    cs.CL

    MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations

    Authors: Vishal Vivek Saley, Goonjan Saha, Rocktim Jyoti Das, Dinesh Raghu, Mausam

    Abstract: Medical task-oriented dialogue systems can assist doctors by collecting patient medical history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout and expanding access to medical services. However, doctor-patient dialogue datasets are not readily available, primarily due to privacy regulations. Moreover, existing datasets lack comprehensive annotations involving… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: EMNLP2024 Camera Ready Version

  5. arXiv:2405.15585  [pdf, other

    cs.CL

    Synergizing In-context Learning with Hints for End-to-end Task-oriented Dialog Systems

    Authors: Vishal Vivek Saley, Rocktim Jyoti Das, Dinesh Raghu, Mausam

    Abstract: End-to-end Task-Oriented Dialog (TOD) systems typically require extensive training datasets to perform well. In contrast, large language model (LLM) based TOD systems can excel even with limited data due to their ability to learn tasks through in-context exemplars. However, these models lack alignment with the style of responses in training data and often generate comprehensive responses, making i… ▽ More

    Submitted 18 October, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: EMNLP2024 Camera-Ready Version

  6. arXiv:2404.17342  [pdf, other

    cs.CL cs.AI

    From Multiple-Choice to Extractive QA: A Case Study for English and Arabic

    Authors: Teresa Lynn, Malik H. Altakrori, Samar Mohamed Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Kirill Chirkunov, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian, Nizar Habash

    Abstract: The rapid evolution of Natural Language Processing (NLP) has favoured major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when… ▽ More

    Submitted 24 January, 2025; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Paper 8 pages, Appendix 12 pages. Published at COLING2025

  7. arXiv:2403.10378  [pdf, other

    cs.CL cs.CV

    EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models

    Authors: Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov

    Abstract: We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images,… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  8. arXiv:2402.02420  [pdf, other

    cs.CL cs.AI

    Factuality of Large Language Models: A Survey

    Authors: Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Georgiev, Rocktim Jyoti Das, Preslav Nakov

    Abstract: Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicabili… ▽ More

    Submitted 31 October, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 11 pages, 1 figure and 2 tables

  9. arXiv:2311.04902  [pdf, other

    cs.CL cs.AI cs.LG

    Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

    Authors: Rocktim Jyoti Das, Mingjie Sun, Liqun Ma, Zhiqiang Shen

    Abstract: Large Language Models (LLMs) with billions of parameters are prime targets for network pruning, removing some model weights without hurting performance. Prior approaches such as magnitude pruning, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained LLMs. In this p… ▽ More

    Submitted 8 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Code and models at https://github.com/VILA-Lab/GBLM-Pruner

  10. arXiv:2305.16697  [pdf, other

    cs.CL

    DKAF: KB Arbitration for Learning Task-Oriented Dialog Systems with Dialog-KB Inconsistencies

    Authors: Vishal Vivek Saley, Rocktim Jyoti Das, Dinesh Raghu, Mausam

    Abstract: Task-oriented dialog (TOD) agents often ground their responses on external knowledge bases (KBs). These KBs can be dynamic and may be updated frequently. Existing approaches for learning TOD agents assume the KB snapshot contemporary to each individual dialog is available during training. However, in real-world scenarios, only the latest KB snapshot is available during training and as a result, th… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  11. arXiv:2303.09128  [pdf, other

    cs.CL cs.LG cs.SE

    Exploring Distributional Shifts in Large Language Models for Code Analysis

    Authors: Shushan Arakelyan, Rocktim Jyoti Das, Yi Mao, Xiang Ren

    Abstract: We systematically study how three large language models with code capabilities - CodeT5, Codex, and ChatGPT - generalize to out-of-domain data. We consider two fundamental applications - code summarization, and code generation. We split data into domains following its natural boundaries - by an organization, by a project, and by a module within the software project. We establish that samples from… ▽ More

    Submitted 5 December, 2023; v1 submitted 16 March, 2023; originally announced March 2023.