Skip to main content

Showing 1–18 of 18 results for author: Ezzini, S

.
  1. arXiv:2506.08768  [pdf, ps, other

    cs.CL

    AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP

    Authors: Ahmed Hasanaath, Aisha Alansari, Ahmed Ashraf, Chafik Salmane, Hamzah Luqman, Saad Ezzini

    Abstract: Large language models (LLMs) have shown remarkable progress in reasoning abilities and general natural language processing (NLP) tasks, yet their performance on Arabic data, characterized by rich morphology, diverse dialects, and complex script, remains underexplored. This paper presents a comprehensive benchmarking study of multiple reasoning-focused LLMs, with a special emphasis on the newly int… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2503.24102  [pdf, ps, other

    cs.CL

    Is LLM the Silver Bullet to Low-Resource Languages Machine Translation?

    Authors: Yewei Song, Lujun Li, Cedric Lothritz, Saad Ezzini, Lama Sleem, Niccolo Gentile, Radu State, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Low-Resource Languages (LRLs) present significant challenges in natural language processing due to their limited linguistic resources and underrepresentation in standard datasets. While recent advances in Large Language Models (LLMs) and Neural Machine Translation have substantially improved translation capabilities for high-resource languages, performance disparities persist for LRLs, particularl… ▽ More

    Submitted 5 June, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  3. arXiv:2501.11498  [pdf, other

    cs.SE cs.AI cs.CL cs.DB

    Dialect2SQL: A Novel Text-to-SQL Dataset for Arabic Dialects with a Focus on Moroccan Darija

    Authors: Salmane Chafik, Saad Ezzini, Ismail Berrada

    Abstract: The task of converting natural language questions (NLQs) into executable SQL queries, known as text-to-SQL, has gained significant interest in recent years, as it enables non-technical users to interact with relational databases. Many benchmarks, such as SPIDER and WikiSQL, have contributed to the development of new models and the evaluation of their performance. In addition, other datasets, like… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  4. arXiv:2501.05255  [pdf, other

    cs.SE cs.CL

    CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing

    Authors: Yewei Song, Xunzhu Tang, Cedric Lothritz, Saad Ezzini, Jacques Klein, Tegawendé F. Bissyandé, Andrey Boytsov, Ulrick Ble, Anne Goujon

    Abstract: API-driven chatbot systems are increasingly integral to software engineering applications, yet their effectiveness hinges on accurately generating and executing API calls. This is particularly challenging in scenarios requiring multi-step interactions with complex parameterization and nested API dependencies. Addressing these challenges, this work contributes to the evaluation and assessment of AI… ▽ More

    Submitted 24 April, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

    Journal ref: The 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025)

  5. arXiv:2408.11788  [pdf, other

    cs.AI cs.CL cs.CV cs.SE

    DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework

    Authors: Zhifei Xie, Daniel Tang, Dingwei Tan, Jacques Klein, Tegawend F. Bissyand, Saad Ezzini

    Abstract: Current video generation models excel at creating short, realistic clips, but struggle with longer, multi-scene videos. We introduce \texttt{DreamFactory}, an LLM-based framework that tackles this challenge. \texttt{DreamFactory} leverages multi-agent collaboration principles and a Key Frames Iteration Design Method to ensure consistency and style across long videos. It utilizes Chain of Thought (… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

    MSC Class: TsingHua University

  6. arXiv:2407.09818  [pdf, other

    cs.CL

    AraFinNLP 2024: The First Arabic Financial NLP Shared Task

    Authors: Sanad Malaysha, Mo El-Haj, Saad Ezzini, Mohammed Khalilia, Mustafa Jarrar, Sultan Almujaiwel, Ismail Berrada, Houda Bouamor

    Abstract: The expanding financial markets of the Arab world require sophisticated Arabic NLP tools. To address this need within the banking domain, the Arabic Financial NLP (AraFinNLP) shared task proposes two subtasks: (i) Multi-dialect Intent Detection and (ii) Cross-dialect Translation and Intent Preservation. This shared task uses the updated ArBanking77 dataset, which includes about 39k parallel querie… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  7. arXiv:2405.16482  [pdf, other

    cs.CL

    DarijaBanking: A New Resource for Overcoming Language Barriers in Banking Intent Detection for Moroccan Arabic Speakers

    Authors: Abderrahman Skiredj, Ferdaous Azhari, Ismail Berrada, Saad Ezzini

    Abstract: Navigating the complexities of language diversity is a central challenge in developing robust natural language processing systems, especially in specialized domains like banking. The Moroccan Dialect (Darija) serves as the common language that blends cultural complexities, historical impacts, and regional differences. The complexities of Darija present a special set of challenges for language mode… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  8. arXiv:2402.02172  [pdf, other

    cs.SE

    CodeAgent: Autonomous Communicative Agents for Code Review

    Authors: Xunzhu Tang, Kisub Kim, Yewei Song, Cedric Lothritz, Bei Li, Saad Ezzini, Haoye Tian, Jacques Klein, Tegawende F. Bissyande

    Abstract: Code review, which aims at ensuring the overall quality and reliability of software, is a cornerstone of software development. Unfortunately, while crucial, Code review is a labor-intensive process that the research community is looking to automate. Existing automated methods rely on single input-output generative models and thus generally struggle to emulate the collaborative nature of code revie… ▽ More

    Submitted 24 September, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  9. arXiv:2312.14725  [pdf, other

    cs.SE

    Enhancing Text-to-SQL Translation for Financial System Design

    Authors: Yewei Song, Saad Ezzini, Xunzhu Tang, Cedric Lothritz, Jacques Klein, Tegawendé Bissyandé, Andrey Boytsov, Ulrick Ble, Anne Goujon

    Abstract: Text-to-SQL, the task of translating natural language questions into SQL queries, is part of various business processes. Its automation, which is an emerging challenge, will empower software practitioners to seamlessly interact with relational databases using natural language, thereby bridging the gap between business needs and software capabilities. In this paper, we consider Large Language Model… ▽ More

    Submitted 8 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 10 pages, ICSE-SEIP 2024

  10. arXiv:2312.01241  [pdf, other

    cs.CR cs.AI

    Just-in-Time Detection of Silent Security Patches

    Authors: Xunzhu Tang, Zhenghan Chen, Kisub Kim, Haoye Tian, Saad Ezzini, Jacques Klein

    Abstract: Open-source code is pervasive. In this setting, embedded vulnerabilities are spreading to downstream software at an alarming rate. While such vulnerabilities are generally identified and addressed rapidly, inconsistent maintenance policies may lead security patches to go unnoticed. Indeed, security patches can be {\em silent}, i.e., they do not always come with comprehensive advisories such as CVE… ▽ More

    Submitted 25 November, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

  11. arXiv:2310.12753   

    cs.SE

    Patch-CLIP: A Patch-Text Pre-Trained Model

    Authors: Xunzhu Tang, Zhenghan Chen, Saad Ezzini, Haoye Tian, Jacques Klein, Tegawende F. Bissyande

    Abstract: In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they pred… ▽ More

    Submitted 30 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: The paper is incomplete, causing much confusion for the community

  12. arXiv:2308.16586  [pdf, other

    cs.SE

    Learning to Represent Patches

    Authors: Xunzhu Tang, Haoye Tian, Zhenghan Chen, Weiguo Pian, Saad Ezzini, Abdoul Kader Kabore, Andrew Habib, Jacques Klein, Tegawende F. Bissyande

    Abstract: Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token sequences or Abstract Syntax Trees (ASTs), they often miss the change's semantic intent and the context of modified lines. To bridge this gap, we introduce a novel… ▽ More

    Submitted 3 October, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

  13. arXiv:2308.15234  [pdf, other

    cs.SE

    Hyperbolic Code Retrieval: A Novel Approach for Efficient Code Search Using Hyperbolic Space Embeddings

    Authors: Xunzhu Tang, zhenghan Chen, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F. Bissyande

    Abstract: Within the realm of advanced code retrieval, existing methods have primarily relied on intricate matching and attention-based mechanisms. However, these methods often lead to computational and memory inefficiencies, posing a significant challenge to their real-world applicability. To tackle this challenge, we propose a novel approach, the Hyperbolic Code QA Matching (HyCoQA). This approach leverag… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  14. arXiv:2308.15233  [pdf, other

    cs.SE

    Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse Grained Approach Towards Security Patch Detection

    Authors: Xunzhu Tang, zhenghan Chen, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F. Bissyande

    Abstract: The growth of open-source software has increased the risk of hidden vulnerabilities that can affect downstream software applications. This concern is further exacerbated by software vendors' practice of silently releasing security patches without explicit warnings or common vulnerability and exposure (CVE) notifications. This lack of transparency leaves users unaware of potential security threats,… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  15. arXiv:2303.01347  [pdf, other

    cs.CL cs.SE

    Letz Translate: Low-Resource Machine Translation for Luxembourgish

    Authors: Yewei Song, Saad Ezzini, Jacques Klein, Tegawende Bissyande, Clément Lefebvre, Anne Goujon

    Abstract: Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in co… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: The associated model is published on HuggingFace: https://huggingface.co/etamin/Letz-Translate-OPUS-LB-EN The Dictionary used in this paper is available in Github: https://github.com/Etamin/Ltz_dictionary

  16. arXiv:2302.04793  [pdf, other

    cs.SE

    AI-based Question Answering Assistance for Analyzing Natural-language Requirements

    Authors: Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh

    Abstract: By virtue of being prevalently written in natural language (NL), requirements are prone to various defects, e.g., inconsistency and incompleteness. As such, requirements are frequently subject to quality assurance processes. These processes, when carried out entirely manually, are tedious and may further overlook important quality issues due to time and budget pressures. In this paper, we propose… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: This paper has been accepted at the 45th International Conference on Software Engineering (ICSE 2023)

  17. arXiv:2206.10227  [pdf, other

    cs.SE cs.CL

    TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements

    Authors: Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh

    Abstract: We introduce TAPHSIR, a tool for anaphoric ambiguity detection and anaphora resolution in requirements. TAPHSIR facilities reviewing the use of pronouns in a requirements specification and revising those pronouns that can lead to misunderstandings during the development process. To this end, TAPHSIR detects the requirements which have potential anaphoric ambiguity and further attempts interpreting… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

  18. arXiv:2206.10218  [pdf, other

    cs.SE

    WikiDoMiner: Wikipedia Domain-specific Miner

    Authors: Saad Ezzini, Sallam Abualhaija, Mehrdad Sabetzadeh

    Abstract: We introduce WikiDoMiner, a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.