Skip to main content

Showing 1–6 of 6 results for author: Inoue, G

.
  1. arXiv:2506.07032  [pdf, ps, other

    cs.CL cs.CV

    A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

    Authors: Bhuiyan Sanjid Shafique, Ashmal Vayani, Muhammad Maaz, Hanoona Abdul Rasheed, Dinura Dissanayake, Mohammed Irfan Kurpath, Yahya Hmaiti, Go Inoue, Jean Lahoud, Md. Safirur Rashid, Shadid Intisar Quasem, Maheen Fatima, Franco Vidal, Mykola Maslych, Ketan Pravin More, Sanoojan Baliah, Hasindri Watawana, Yuhao Li, Fabian Farestam, Leon Schaller, Roman Tymtsiv, Simon Weber, Hisham Cholakkal, Ivan Laptev, Shin'ichi Satoh , et al. (4 additional authors not shown)

    Abstract: Large multimodal models (LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in English language. While few recent works explore multilingual image LMMs, to the best of our knowledge, moving beyond the English language for cultural and linguistic inclusivity is yet to be investigated in the context of vid… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  2. arXiv:2406.05760  [pdf, other

    cs.CL

    Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

    Authors: Salman Elgamal, Ossama Obeid, Tameem Kabbani, Go Inoue, Nizar Habash

    Abstract: The widespread absence of diacritical marks in Arabic text poses a significant challenge for Arabic natural language processing (NLP). This paper explores instances of naturally occurring diacritics, referred to as "diacritics in the wild," to unveil patterns and latent information across six diverse genres: news articles, novels, children's books, poetry, political documents, and ChatGPT outputs.… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  3. arXiv:2305.14734  [pdf, other

    cs.CL

    Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation

    Authors: Bashar Alhafni, Go Inoue, Christian Khairallah, Nizar Habash

    Abstract: Grammatical error correction (GEC) is a well-explored problem in English with many existing models and datasets. However, research on GEC in morphologically rich languages has been limited due to challenges such as data scarcity and language complexity. In this paper, we present the first results on Arabic GEC using two newly developed Transformer-based pretrained sequence-to-sequence models. We a… ▽ More

    Submitted 9 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023

  4. arXiv:2211.16807  [pdf, other

    cs.CL

    Camelira: An Arabic Multi-Dialect Morphological Disambiguator

    Authors: Ossama Obeid, Go Inoue, Nizar Habash

    Abstract: We present Camelira, a web-based Arabic multi-dialect morphological disambiguation tool that covers four major variants of Arabic: Modern Standard Arabic, Egyptian, Gulf, and Levantine. Camelira offers a user-friendly web interface that allows researchers and language learners to explore various linguistic information, such as part-of-speech, morphological features, and lemmas. Our system also pro… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

  5. arXiv:2110.06852  [pdf, other

    cs.CL

    Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

    Authors: Go Inoue, Salam Khalifa, Nizar Habash

    Abstract: We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models. Our models consistently outperform existing systems in Modern Standard Arabic and all the Arabic dialects we study, achieving 2.6% absolute improvement over the previous state-of-the-art in Modern Standard Arabic, 2.8% in Gulf, 1.6% in Egypti… ▽ More

    Submitted 21 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of ACL 2022

  6. arXiv:2103.06678  [pdf, other

    cs.CL

    The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

    Authors: Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, Nizar Habash

    Abstract: In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models. To do so, we build three pre-trained language models across three variants of Arabic: Modern Standard Arabic (MSA), dialectal Arabic, and classical Arabic, in addition to a fourth language model which is pre-trained on a mix of the three. We also examine the imp… ▽ More

    Submitted 4 September, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: Accepted to WANLP 2021