Skip to main content

Showing 1–3 of 3 results for author: Dharmasiri, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.00048  [pdf, other

    cs.CL cs.AI

    Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs

    Authors: Cong Duy Vu Hoang, Gioacchino Tangari, Clemence Lanfranchi, Dalu Guo, Paul Cayet, Steve Siu, Don Dharmasiri, Yuan-Fang Li, Long Duong, Damien Hilloulin, Rhicheek Patra, Sungpack Hong, Hassan Chafi

    Abstract: The growing adoption of large language models (LLMs) in business applications has amplified interest in Natural Language to SQL (NL2SQL) solutions, in which there is competing demand for high performance and efficiency. Domain- and customer-specific requirements further complicate the problem. To address this conundrum, we introduce Distill-C, a distilled customization framework tailored for NL2SQ… ▽ More

    Submitted 30 March, 2025; originally announced April 2025.

    Comments: Preprint, accepted at NAACL 2025 (Industry Track)

  2. arXiv:2502.16747  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

    Authors: Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Thanh Tien Vu, Don Dharmasiri, Yuan-Fang Li, Long Duong

    Abstract: Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios… ▽ More

    Submitted 20 May, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: Accepted to Table Representation Learning Workshop at ACL 2025

  3. arXiv:2411.00005  [pdf, other

    cs.SE cs.AI

    Mastering the Craft of Data Synthesis for CodeLLMs

    Authors: Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li

    Abstract: Large language models (LLMs) have shown impressive performance in \emph{code} understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation. Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and tax… ▽ More

    Submitted 7 February, 2025; v1 submitted 16 October, 2024; originally announced November 2024.

    Comments: Accepted at NAACL 2025