Skip to main content

Showing 1–3 of 3 results for author: Kurpath, M I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07032  [pdf, ps, other

    cs.CL cs.CV

    A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

    Authors: Bhuiyan Sanjid Shafique, Ashmal Vayani, Muhammad Maaz, Hanoona Abdul Rasheed, Dinura Dissanayake, Mohammed Irfan Kurpath, Yahya Hmaiti, Go Inoue, Jean Lahoud, Md. Safirur Rashid, Shadid Intisar Quasem, Maheen Fatima, Franco Vidal, Mykola Maslych, Ketan Pravin More, Sanoojan Baliah, Hasindri Watawana, Yuhao Li, Fabian Farestam, Leon Schaller, Roman Tymtsiv, Simon Weber, Hisham Cholakkal, Ivan Laptev, Shin'ichi Satoh , et al. (4 additional authors not shown)

    Abstract: Large multimodal models (LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in English language. While few recent works explore multilingual image LMMs, to the best of our knowledge, moving beyond the English language for cultural and linguistic inclusivity is yet to be investigated in the context of vid… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  2. arXiv:2503.04724  [pdf, other

    cs.CL

    LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

    Authors: Sambal Shikhar, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jean Lahoud, Fahad Khan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal

    Abstract: Recent advancements in speech-to-speech dialogue systems leverage LLMs for multimodal interactions, yet they remain hindered by fine-tuning requirements, high computational overhead, and text-speech misalignment. Existing speech-enabled LLMs often degrade conversational quality by modifying the LLM, thereby compromising its linguistic capabilities. In contrast, we propose LLMVoX, a lightweight 30M… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  3. arXiv:2412.07769  [pdf, other

    cs.CV

    BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

    Authors: Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Sara Pieri, Saeed Yahya Alseiari, Shanavas Cholakkal, Khaled Aldahmani, Fahad Khan, Rao Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal

    Abstract: This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMediX2 leverages the Llama3.1 architecture and integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, su… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.