Skip to main content

Showing 1–7 of 7 results for author: Kwon, S Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18436  [pdf, ps, other

    cs.CL

    Voice of a Continent: Mapping Africa's Speech Technology Frontier

    Authors: AbdelRahim Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed

    Abstract: Africa's rich linguistic diversity remains significantly underrepresented in speech technologies, creating barriers to digital inclusion. To alleviate this challenge, we systematically map the continent's speech space of datasets and technologies, leading to a new comprehensive benchmark SimbaBench for downstream African speech tasks. Using SimbaBench, we introduce the Simba family of models, achi… ▽ More

    Submitted 19 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2503.00231  [pdf, other

    cs.CL cs.AI

    Jawaher: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking

    Authors: Samar M. Magdy, Sang Yun Kwon, Fakhraddin Alwajih, Safaa Abdelfadil, Shady Shehata, Muhammad Abdul-Mageed

    Abstract: Recent advancements in instruction fine-tuning, alignment methods such as reinforcement learning from human feedback (RLHF), and optimization techniques like direct preference optimization (DPO) have significantly enhanced the adaptability of large language models (LLMs) to user preferences. However, despite these innovations, many LLMs continue to exhibit biases toward Western, Anglo-centric, or… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Project GitHub page is accessible at: https://github.com/UBC-NLP/jawaher

  3. arXiv:2410.18163  [pdf, other

    cs.CL

    Gazelle: An Instruction Dataset for Arabic Writing Assistance

    Authors: Samar M. Magdy, Fakhraddin Alwajih, Sang Yun Kwon, Reem Abdel-Salam, Muhammad Abdul-Mageed

    Abstract: Writing has long been considered a hallmark of human intelligence and remains a pinnacle task for artificial intelligence (AI) due to the intricate cognitive processes involved. Recently, rapid advancements in generative AI, particularly through the development of Large Language Models (LLMs), have significantly transformed the landscape of writing assistance. However, underrepresented languages l… ▽ More

    Submitted 4 November, 2024; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: EMNLP2024 Finding Camara-ready version

  4. arXiv:2312.08400  [pdf, other

    cs.CL cs.AI

    Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction

    Authors: Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

    Abstract: Large language models (LLMs) finetuned to follow human instruction have recently exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC), especially on languages other than English, remains significantly unexplored. In this work, we evaluate the abilities of instruction finetuned LLMs in Arabic GEC, a complex task due to Ara… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: text overlap with arXiv:2308.04492

  5. arXiv:2308.04492  [pdf, other

    cs.AI

    ChatGPT for Arabic Grammatical Error Correction

    Authors: Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoud, Muhammad Abdul-Mageed

    Abstract: Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC) tasks, particularly in non-English languages, remains significantly unexplored. In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex du… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  6. arXiv:2304.13292  [pdf, other

    cs.CL

    Zero-Shot Slot and Intent Detection in Low-Resource Languages

    Authors: Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed

    Abstract: Intent detection and slot filling are critical tasks in spoken and natural language understanding for task-oriented dialog systems. In this work we describe our participation in the slot and intent detection for low-resource language varieties (SID4LR; Aepli et al. (2023)). We investigate the slot and intent detection (SID) tasks using a wide range of models and settings. Given the recent success… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: VarDial @ EACL

  7. arXiv:1803.07140  [pdf, other

    cs.CV

    Visual Psychophysics for Making Face Recognition Algorithms More Explainable

    Authors: Brandon RichardWebster, So Yon Kwon, Christopher Clarizio, Samuel E. Anthony, Walter J. Scheirer

    Abstract: Scientific fields that are interested in faces have developed their own sets of concepts and procedures for understanding how a target model system (be it a person or algorithm) perceives a face under varying conditions. In computer vision, this has largely been in the form of dataset evaluation for recognition tasks where summary statistics are used to measure progress. While aggregate performanc… ▽ More

    Submitted 19 July, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 20 pages, 5 figures. To appear Proceedings of the European Conference on Computer Vision (ECCV). For supplemental material see http://bjrichardwebster.com/papers/menagerie/supp