Skip to main content

Showing 1–11 of 11 results for author: Acikgoz, E C

.
  1. arXiv:2505.01592  [pdf, other

    cs.CL cs.AI

    PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents

    Authors: Takyoung Kim, Janvijay Singh, Shuhaib Mehri, Emre Can Acikgoz, Sagnik Mukherjee, Nimet Beyza Bozdag, Sumuk Shashidhar, Gokhan Tur, Dilek Hakkani-Tür

    Abstract: The growing capabilities of large language models (LLMs) in instruction-following and context-understanding lead to the era of agents with numerous applications. Among these, task planning agents have become especially prominent in realistic scenarios involving complex internal pipelines, such as context understanding, tool management, and response generation. However, existing benchmarks predomin… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: Preprint in progress

  2. arXiv:2504.19982  [pdf, other

    cs.CL cs.AI

    TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons

    Authors: Emre Can Acikgoz, Carl Guo, Suvodip Dey, Akul Datta, Takyoung Kim, Gokhan Tur, Dilek Hakkani-Tür

    Abstract: Task-oriented dialogue (TOD) systems are experiencing a revolution driven by Large Language Models (LLMs), yet the evaluation methodologies for these systems remain insufficient for their growing sophistication. While traditional automatic metrics effectively assessed earlier modular systems, they focus solely on the dialogue level and cannot detect critical intermediate errors that can arise duri… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  3. arXiv:2504.16939  [pdf, other

    cs.AI cs.CL

    A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions

    Authors: Emre Can Acikgoz, Cheng Qian, Hongru Wang, Vardhan Dongre, Xiusi Chen, Heng Ji, Dilek Hakkani-Tür, Gokhan Tur

    Abstract: Recent advances in Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users. Yet, fundamental questions about their capabilities, limitations, and paths forward remain open. This survey paper presents a desideratum for next-generation Conversa… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  4. arXiv:2504.13958  [pdf, other

    cs.LG cs.AI cs.CL

    ToolRL: Reward is All Tool Learning Needs

    Authors: Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji

    Abstract: Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios. Recent advancements in reinforcement learning (RL), particularly with R1-like models, have demonstrated promising reasoning and generalization abilities. Yet, reward design for tool use presents unique ch… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 19 Pages, 12 Figures, 12 Tables

  5. arXiv:2502.11435  [pdf, other

    cs.AI cs.CL cs.LG

    SMART: Self-Aware Agent for Tool Overuse Mitigation

    Authors: Cheng Qian, Emre Can Acikgoz, Hongru Wang, Xiusi Chen, Avirup Sil, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji

    Abstract: Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness, failing to balance these approaches effectively. This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks solvable with parametric knowledge, increasing computational overhead. Inspired by human metacognition, we introduce SMART (… ▽ More

    Submitted 24 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: 18 pages, 11 tables, 7 figures, ACL 2025 Findings

  6. arXiv:2502.08820  [pdf, other

    cs.AI cs.CL

    Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model

    Authors: Emre Can Acikgoz, Jeremiah Greer, Akul Datta, Ze Yang, William Zeng, Oussama Elachqar, Emmanouil Koukoumidis, Dilek Hakkani-Tür, Gokhan Tur

    Abstract: Large Language Models (LLMs) with API-calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm. However, current approaches face a critical dilemma: TOD systems are often trained on a limited set of target APIs, requiring new data to maintain their quality when interfacing with new services, while LAs ar… ▽ More

    Submitted 18 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  7. arXiv:2411.00927  [pdf, other

    cs.CL cs.AI cs.HC

    ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

    Authors: Vardhan Dongre, Xiaocheng Yang, Emre Can Acikgoz, Suvodip Dey, Gokhan Tur, Dilek Hakkani-Tür

    Abstract: Large language model (LLM)-based agents are increasingly employed to interact with external environments (e.g., games, APIs, world models) to solve user-provided tasks. However, current frameworks often lack the ability to collaborate effectively with users in fully conversational settings. Conversations are essential for aligning on task details, achieving user-defined goals, and satisfying prefe… ▽ More

    Submitted 19 April, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 31 pages, 10 Figures, 25 Tables

  8. arXiv:2405.04685  [pdf, other

    cs.CL cs.AI cs.LG

    Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking

    Authors: Emre Can Acikgoz, Mete Erdogan, Deniz Yuret

    Abstract: Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages. This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations, with a special focus on Turkish. We conduct an in-depth analysis to evaluate the impact of… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2404.16621  [pdf, other

    cs.LG cs.AI cs.CL

    Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

    Authors: Emre Can Acikgoz, Osman Batur İnce, Rayene Bench, Arda Anıl Boz, İlker Kesen, Aykut Erdem, Erkut Erdem

    Abstract: The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  10. arXiv:2311.07022  [pdf, other

    cs.CL cs.AI cs.CV

    ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

    Authors: Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem

    Abstract: With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm foo… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Preprint. 48 pages, 22 figures, 10 tables

  11. arXiv:2211.01736  [pdf, other

    cs.CL cs.AI cs.LG

    Transformers on Multilingual Clause-Level Morphology

    Authors: Emre Can Acikgoz, Tilek Chubakov, Müge Kural, Gözde Gül Şahin, Deniz Yuret

    Abstract: This paper describes our winning systems in MRL: The 1st Shared Task on Multilingual Clause-level Morphology (EMNLP 2022 Workshop) designed by KUIS AI NLP team. We present our work for all three parts of the shared task: inflection, reinflection, and analysis. We mainly explore transformers with two approaches: (i) training models from scratch in combination with data augmentation, and (ii) transf… ▽ More

    Submitted 13 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.