Skip to main content

Showing 1–4 of 4 results for author: Thawkar, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18152  [pdf, ps, other

    cs.CL

    Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

    Authors: Wafa Alghallabi, Ritesh Thawkar, Sara Ghaboura, Ketan More, Omkar Thawakar, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer

    Abstract: Arabic poetry is one of the richest and most culturally rooted forms of expression in the Arabic language, known for its layered meanings, stylistic diversity, and deep historical continuity. Although large language models (LLMs) have demonstrated strong performance across languages and tasks, their ability to understand Arabic poetry remains largely unexplored. In this work, we introduce \emph{Fa… ▽ More

    Submitted 26 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Github:https://github.com/mbzuai-oryx/FannOrFlop, Dataset:https://huggingface.co/datasets/omkarthawakar/FannOrFlop

  2. arXiv:2503.10621  [pdf, other

    cs.CV cs.RO

    DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

    Authors: Ayesha Ishaq, Jean Lahoud, Ketan More, Omkar Thawakar, Ritesh Thawkar, Dinura Dissanayake, Noor Ahsan, Yuhao Li, Fahad Shahbaz Khan, Hisham Cholakkal, Ivan Laptev, Rao Muhammad Anwer, Salman Khan

    Abstract: While large multimodal models (LMMs) have demonstrated strong performance across various Visual Question Answering (VQA) tasks, certain challenges require complex multi-step reasoning to reach accurate answers. One particularly challenging task is autonomous driving, which demands thorough cognitive processing before decisions can be made. In this domain, a sequential and interpretive understandin… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 8 pages, 4 figures, 3 tables, github: https://github.com/ayesha-ishaq/DriveLMM-o1

  3. arXiv:2502.14865  [pdf, other

    cs.CV cs.LG

    Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

    Authors: Sara Ghaboura, Ketan More, Ritesh Thawkar, Wafa Alghallabi, Omkar Thawakar, Fahad Shahbaz Khan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer

    Abstract: Understanding historical and cultural artifacts demands human expertise and advanced computational techniques, yet the process remains complex and time-intensive. While large multimodal models offer promising support, their evaluation and improvement require a standardized benchmark. To address this, we introduce TimeTravel, a benchmark of 10,250 expert-verified samples spanning 266 distinct cultu… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 4 pages, 6 figures

  4. arXiv:2501.06186  [pdf, other

    cs.CV

    LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

    Authors: Omkar Thawakar, Dinura Dissanayake, Ketan More, Ritesh Thawkar, Ahmed Heakl, Noor Ahsan, Yuhao Li, Mohammed Zumri, Jean Lahoud, Rao Muhammad Anwer, Hisham Cholakkal, Ivan Laptev, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan

    Abstract: Reasoning is a fundamental capability for solving complex multi-step problems, particularly in visual contexts where sequential step-wise understanding is essential. Existing approaches lack a comprehensive framework for evaluating visual reasoning and do not emphasize step-wise problem-solving. To this end, we propose a comprehensive framework for advancing step-by-step visual reasoning in large… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: 15 pages, 5 Figures