Skip to main content

Showing 1–12 of 12 results for author: Khan, S U R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04249  [pdf, ps, other

    cs.CY cs.ET

    Ethics by Design: A Lifecycle Framework for Trustworthy AI in Medical Imaging From Transparent Data Governance to Clinically Validated Deployment

    Authors: Umer Sadiq Khan, Saif Ur Rehman Khan

    Abstract: The integration of artificial intelligence (AI) in medical imaging raises crucial ethical concerns at every stage of its development, from data collection to deployment. Addressing these concerns is essential for ensuring that AI systems are developed and implemented in a manner that respects patient rights and promotes fairness. This study aims to explore the ethical implications of AI in medical… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2506.23111  [pdf, ps, other

    cs.CL

    FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes

    Authors: Janki Atul Nawale, Mohammed Safi Ur Rahman Khan, Janani D, Mansi Gupta, Danish Pruthi, Mitesh M. Khapra

    Abstract: Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-c… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted in ACL 2025

  3. arXiv:2505.06381  [pdf, ps, other

    cs.CV

    Robust & Precise Knowledge Distillation-based Novel Context-Aware Predictor for Disease Detection in Brain and Gastrointestinal

    Authors: Saif Ur Rehman Khan, Muhammad Nabeel Asim, Sebastian Vollmer, Andreas Dengel

    Abstract: Medical disease prediction, particularly through imaging, remains a challenging task due to the complexity and variability of medical data, including noise, ambiguity, and differing image quality. Recent deep learning models, including Knowledge Distillation (KD) methods, have shown promising results in brain tumor image identification but still face limitations in handling uncertainty and general… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  4. arXiv:2503.14209  [pdf, other

    cs.CV

    AI-Driven Diabetic Retinopathy Diagnosis Enhancement through Image Processing and Salp Swarm Algorithm-Optimized Ensemble Network

    Authors: Saif Ur Rehman Khan, Muhammad Nabeel Asim, Sebastian Vollmer, Andreas Dengel

    Abstract: Diabetic retinopathy is a leading cause of blindness in diabetic patients and early detection plays a crucial role in preventing vision loss. Traditional diagnostic methods are often time-consuming and prone to errors. The emergence of deep learning techniques has provided innovative solutions to improve diagnostic efficiency. However, single deep learning models frequently face issues related to… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  5. arXiv:2501.07244  [pdf, other

    cs.CV cs.CL

    Can Vision-Language Models Evaluate Handwritten Math?

    Authors: Oikantik Nath, Hanani Bathina, Mohammed Safi Ur Rahman Khan, Mitesh M. Khapra

    Abstract: Recent advancements in Vision-Language Models (VLMs) have opened new possibilities in automatic grading of handwritten student responses, particularly in mathematics. However, a comprehensive study to test the ability of VLMs to evaluate and reason over handwritten content remains absent. To address this gap, we introduce FERMAT, a benchmark designed to assess the ability of VLMs to detect, locali… ▽ More

    Submitted 12 March, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  6. arXiv:2411.19096  [pdf, other

    cs.CL

    Pralekha: An Indic Document Alignment Evaluation Benchmark

    Authors: Sanjay Suryanarayanan, Haiyue Song, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M. Khapra, Raj Dabre

    Abstract: Mining parallel document pairs poses a significant challenge because existing sentence embedding models often have limited context windows, preventing them from effectively capturing document-level information. Another overlooked issue is the lack of concrete evaluation benchmarks comprising high-quality parallel document pairs for assessing document-level mining approaches, particularly for Indic… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

    Comments: Work in Progress

  7. arXiv:2411.04699  [pdf, ps, other

    cs.CL

    Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages

    Authors: Ashwin Sankar, Sparsh Jain, Nikhil Narasimhan, Devilal Choudhary, Dhairya Suman, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M Khapra, Raj Dabre

    Abstract: Speech translation for Indian languages remains a challenging task due to the scarcity of large-scale, publicly available datasets that capture the linguistic diversity and domain coverage essential for real-world applications. Existing datasets cover a fraction of Indian languages and lack the breadth needed to train robust models that generalize beyond curated benchmarks. To bridge this gap, we… ▽ More

    Submitted 31 May, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: Accepted at ACL (Main) 2025

  8. arXiv:2411.02538  [pdf, other

    cs.CL

    MILU: A Multi-task Indic Language Understanding Benchmark

    Authors: Sshubam Verma, Mohammed Safi Ur Rahman Khan, Vishwajeet Kumar, Rudra Murthy, Jaydeep Sen

    Abstract: Evaluating Large Language Models (LLMs) in low-resource and linguistically diverse languages remains a significant challenge in NLP, particularly for languages using non-Latin scripts like those spoken in India. Existing benchmarks predominantly focus on English, leaving substantial gaps in assessing LLM capabilities in these languages. We introduce MILU, a Multi task Indic Language Understanding… ▽ More

    Submitted 4 February, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

  9. arXiv:2410.13394  [pdf, other

    cs.CL

    Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs

    Authors: Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Dilip Venkatesh, Raj Dabre, Anoop Kunchukuttan, Mitesh M. Khapra

    Abstract: Evaluating machine-generated text remains a significant challenge in NLP, especially for non-English languages. Current methodologies, including automated metrics, human assessments, and LLM-based evaluations, predominantly focus on English, revealing a significant gap in multilingual evaluation frameworks. We introduce the Cross Lingual Auto Evaluation (CIA) Suite, an extensible framework that in… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  10. arXiv:2406.13439  [pdf, other

    cs.CL

    Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

    Authors: Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra

    Abstract: Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework d… ▽ More

    Submitted 26 November, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: EMNLP 2024

  11. IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

    Authors: Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad B, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

    Abstract: Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re… ▽ More

    Submitted 28 November, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: ACL-2024 Outstanding Paper

  12. arXiv:2401.15006  [pdf, other

    cs.CL cs.AI

    Airavata: Introducing Hindi Instruction-tuned LLM

    Authors: Jay Gala, Thanmay Jayakumar, Jaavid Aktar Husain, Aswanth Kumar M, Mohammed Safi Ur Rahman Khan, Diptesh Kanojia, Ratish Puduppully, Mitesh M. Khapra, Raj Dabre, Rudra Murthy, Anoop Kunchukuttan

    Abstract: We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additional… ▽ More

    Submitted 26 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Work in progress