Skip to main content

Showing 1–4 of 4 results for author: Lopo, J A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12450  [pdf, ps, other

    cs.CL

    Language Surgery in Multilingual Large Language Models

    Authors: Joanito Agili Lopo, Muhammad Ravi Shulthan Habibi, Tack Hwa Wong, Muhammad Ilham Ghozali, Fajri Koto, Genta Indra Winata, Peerat Limkonchotiwat, Alham Fikri Aji, Samuel Cahyawijaya

    Abstract: Large Language Models (LLMs) have demonstrated remarkable generalization capabilities across tasks and languages, revolutionizing natural language processing. This paper investigates the naturally emerging representation alignment in LLMs, particularly in the middle layers, and its implications for disentangling language-specific and language-agnostic information. We empirically confirm the existe… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  2. arXiv:2408.08805  [pdf, other

    cs.CL cs.AI

    CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems

    Authors: Joanito Agili Lopo, Marina Indah Prasasti, Alma Permatasari

    Abstract: In this study, we introduce CIKMar, an efficient approach to educational dialogue systems powered by the Gemma Language model. By leveraging a Dual-Encoder ranking system that incorporates both BERT and SBERT model, we have designed CIKMar to deliver highly relevant and accurate responses, even with the constraints of a smaller language model size. Our evaluation reveals that CIKMar achieves a rob… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: This paper is the result of the final project of the Natural Language Processing course, Master of Artificial Intelligence, Universitas Gadjah Mada

  3. arXiv:2406.10118  [pdf, other

    cs.CL

    SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

    Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse , et al. (36 additional authors not shown)

    Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t… ▽ More

    Submitted 10 March, 2025; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: https://seacrowd.github.io/ Published in EMNLP 2024

  4. arXiv:2404.01009  [pdf, other

    cs.CL

    Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages

    Authors: Joanito Agili Lopo, Radius Tanone

    Abstract: In Indonesia, local languages play an integral role in the culture. However, the available Indonesian language resources still fall into the category of limited data in the Natural Language Processing (NLP) field. This is become problematic when build NLP model for these languages. To address this gap, we introduce Bhinneka Korpus, a multilingual parallel corpus featuring five Indonesian local lan… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Submitted for consideration at the EAMT, 2024. Results pending