Skip to main content

Showing 1–50 of 85 results for author: Rajmohan, S

.
  1. arXiv:2506.08669  [pdf, ps, other

    cs.LG cs.AI

    Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search

    Authors: Dongge Han, Menglin Xia, Daniel Madrigal Diaz, Samuel Kessler, Ankur Mallick, Xuchao Zhang, Mirian Del Carmen Hipolito Garcia, Jin Xu, Victor Rühle, Saravan Rajmohan

    Abstract: Small language models (SLMs) offer promising and efficient alternatives to large language models (LLMs). However, SLMs' limited capacity restricts their reasoning capabilities and makes them sensitive to prompt variations. To address these challenges, we propose a novel framework that enhances SLM reasoning capabilities through LLM generated blueprints. The blueprints provide structured, high-leve… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: TTODLer-FM Workshop@ICML'25 (Tiny Titans: The next wave of On-Device Learning for Foundational Models)

  2. arXiv:2505.23419  [pdf, ps, other

    cs.SE cs.AI cs.CL

    SWE-bench Goes Live!

    Authors: Linghao Zhang, Shilin He, Chaoyun Zhang, Yu Kang, Bowen Li, Chengxing Xie, Junhao Wang, Maoquan Wang, Yufan Huang, Shengyu Fu, Elsie Nallipogu, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang

    Abstract: The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a critical benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench and its variants have become standard in this domain, they suffer from key limitations: they have not been updated since their initial releases, cover a narrow set of repositories, and depend heavily o… ▽ More

    Submitted 1 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: Homepage: \url{https://swe-bench-live.github.io/}, Code: \url{https://github.com/SWE-bench-Live}, Dataset: \url{https://huggingface.co/SWE-bench-Live}

  3. arXiv:2505.22338  [pdf, ps, other

    cs.CL cs.AI

    Text2Grad: Reinforcement Learning from Natural Language Feedback

    Authors: Hanyang Wang, Lu Wang, Chaoyun Zhang, Tianjun Mao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Traditional RLHF optimizes language models with coarse, scalar rewards that mask the fine-grained reasons behind success or failure, leading to slow and opaque learning. Recent work augments RL with textual critiques through prompting or reflection, improving interpretability but leaving model parameters untouched. We introduce Text2Grad, a reinforcement-learning paradigm that turns free-form text… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: The code for our method is available at https://github.com/microsoft/Text2Grad

  4. arXiv:2505.11271  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models

    Authors: Camille Couturier, Spyros Mastorakis, Haiying Shen, Saravan Rajmohan, Victor Rühle

    Abstract: Large Language Models (LLMs) are increasingly deployed across edge and cloud platforms for real-time question-answering and retrieval-augmented generation. However, processing lengthy contexts in distributed systems incurs high computational overhead, memory usage, and network bandwidth. This paper introduces a novel semantic caching approach for storing and reusing intermediate contextual summari… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Preprint. Paper accepted at ICCCN 2025, the final version will appear in the proceedings

    ACM Class: I.2.7

  5. arXiv:2505.00742  [pdf, other

    cs.CV cs.AI eess.IV

    Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

    Authors: Jiaxu Qian, Chendong Wang, Yifan Yang, Chaoyun Zhang, Huiqiang Jiang, Xufang Luo, Yu Kang, Qingwei Lin, Anlan Zhang, Shiqi Jiang, Ting Cao, Tianjun Mao, Suman Banerjee, Guyue Liu, Saravan Rajmohan, Dongmei Zhang, Yuqing Yang, Qi Zhang, Lili Qiu

    Abstract: Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omi… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  6. arXiv:2504.16871  [pdf, other

    cs.LG

    Exploring How LLMs Capture and Represent Domain-Specific Knowledge

    Authors: Mirian Hipolito Garcia, Camille Couturier, Daniel Madrigal Diaz, Ankur Mallick, Anastasios Kyrillidis, Robert Sim, Victor Ruhle, Saravan Rajmohan

    Abstract: We study whether Large Language Models (LLMs) inherently capture domain-specific nuances in natural language. Our experiments probe the domain sensitivity of LLMs by examining their ability to distinguish queries from different domains using hidden states generated during the prefill phase. We reveal latent domain-related trajectories that indicate the model's internal recognition of query domains… ▽ More

    Submitted 24 April, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  7. arXiv:2504.15188  [pdf, other

    cs.AI

    Synergistic Weak-Strong Collaboration by Aligning Preferences

    Authors: Yizhu Jiao, Xuchao Zhang, Zhaoyang Wang, Yubo Ma, Zhun Deng, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Jiawei Han, Huaxiu Yao

    Abstract: Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge. Fine-tuning large models for every niche application is often infeasible due to black-box constraints and high computational overhead. To address this, we propose a collaborative framework that pairs a specialized weak model with a general strong m… ▽ More

    Submitted 22 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  8. arXiv:2504.14603  [pdf, other

    cs.AI cs.HC cs.OS

    UFO2: The Desktop AgentOS

    Authors: Chaoyun Zhang, He Huang, Chiming Ni, Jian Mu, Si Qin, Shilin He, Lu Wang, Fangkai Yang, Pu Zhao, Chao Du, Liqun Li, Yu Kang, Zhao Jiang, Suzhen Zheng, Rujia Wang, Jiaxu Qian, Minghua Ma, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution. We present UFO2, a multiagent AgentOS for Windows deskto… ▽ More

    Submitted 25 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: The source code of UFO2 is publicly available at https://github.com/microsoft/UFO/, with comprehensive documentation provided at https://microsoft.github.io/UFO/

  9. arXiv:2504.11505  [pdf, other

    cs.SE

    eARCO: Efficient Automated Root Cause Analysis with Prompt Optimization

    Authors: Drishti Goel, Raghav Magazine, Supriyo Ghosh, Akshay Nambi, Prathamesh Deshpande, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan

    Abstract: Root cause analysis (RCA) for incidents in large-scale cloud systems is a complex, knowledge-intensive task that often requires significant manual effort from on-call engineers (OCEs). Improving RCA is vital for accelerating the incident resolution process and reducing service downtime and manual efforts. Recent advancements in Large-Language Models (LLMs) have proven to be effective in solving di… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  10. arXiv:2504.08865  [pdf, other

    cs.DC

    An Empirical Study of Production Incidents in Generative AI Cloud Services

    Authors: Haoran Yan, Yinfang Chen, Minghua Ma, Ming Wen, Shan Lu, Shenglin Zhang, Tianyin Xu, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Chaoyun Zhang, Dongmei Zhang

    Abstract: The ever-increasing demand for generative artificial intelligence (GenAI) has motivated cloud-based GenAI services such as Azure OpenAI Service and Amazon Bedrock. Like any large-scale cloud service, failures are inevitable in cloud-based GenAI services, resulting in user dissatisfaction and significant monetary losses. However, GenAI cloud services, featured by their massive parameter scales, har… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  11. arXiv:2502.19557  [pdf, other

    cs.CL cs.AI

    Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

    Authors: Yudi Zhang, Lu Wang, Meng Fang, Yali Du, Chenghua Huang, Jun Wang, Qingwei Lin, Mykola Pechenizkiy, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Distilling large language models (LLMs) typically involves transferring the teacher model's responses through supervised fine-tuning (SFT). However, this approach neglects the potential to distill both data (output content) and reward signals (quality evaluations). Extracting reliable reward signals directly from teacher models is challenging, as LLMs are optimized for generation rather than evalu… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 14 pages, 7 figures

  12. arXiv:2502.18906  [pdf, other

    cs.LG

    VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

    Authors: Jiani Zheng, Lu Wang, Fangkai Yang, Chaoyun Zhang, Lingrui Mei, Wenjie Yin, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples value estimation from policy optimization by leveraging a… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 20pages,5 figures

  13. arXiv:2502.18293  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AMPO: Active Multi-Preference Optimization for Self-play Preference Selection

    Authors: Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan

    Abstract: Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training obj… ▽ More

    Submitted 8 June, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: Accepted at ICML 2025

  14. arXiv:2502.16944  [pdf, other

    cs.LG cs.AI

    Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

    Authors: Chenghua Huang, Lu Wang, Fangkai Yang, Pu Zhao, Zhixu Li, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Proximal Policy Optimization (PPO)-based Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human preferences. It requires joint training of an actor and critic with a pretrained, fixed reward model for guidance. This approach increases computational complexity and instability due to actor-critic interdependence. Additionally, PPO lacks ac… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 16 pages, 3 figures

  15. arXiv:2502.14617  [pdf, other

    cs.DC

    Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale

    Authors: Shashwat Jaiswal, Kunal Jain, Yogesh Simmhan, Anjaly Parayil, Ankur Mallick, Rujia Wang, Renee St. Amant, Chetan Bansal, Victor Rühle, Anoop Kulkarni, Steve Kofsky, Saravan Rajmohan

    Abstract: Large Language Model (LLM) inference workloads handled by global cloud providers can include both latency-sensitive and insensitive tasks, creating a diverse range of Service Level Agreement (SLA) requirements. Managing these mixed workloads is challenging due to the complexity of the inference stack, which includes multiple LLMs, hardware configurations, and geographic distributions. Current opti… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 15 pages, 17 figures, 2 tables

  16. arXiv:2502.04376  [pdf, other

    cs.CL cs.AI

    MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf

    Authors: Lingxiang Hu, Shurun Yuan, Xiaoting Qin, Jue Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  17. arXiv:2502.03358  [pdf, ps, other

    cs.CL

    Minerva: A Programmable Memory Test Benchmark for Language Models

    Authors: Menglin Xia, Victor Ruehle, Saravan Rajmohan, Reza Shokri

    Abstract: How effectively can LLM-based AI assistants utilize their memory (context) to perform various tasks? Traditional data benchmarks, which are often manually crafted, suffer from several limitations: they are static, susceptible to overfitting, difficult to interpret, and lack actionable insights--failing to pinpoint the specific capabilities a model lacks when it does not pass a test. In this paper,… ▽ More

    Submitted 9 June, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: ICML 2025

  18. arXiv:2501.19056  [pdf, other

    cs.SE cs.AI cs.CL cs.MA

    Enabling Autonomic Microservice Management through Self-Learning Agents

    Authors: Fenglin Yu, Fangkai Yang, Xiaoting Qin, Zhiyang Zhang, Jue Zhang, Qingwei Lin, Hongyu Zhang, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: The increasing complexity of modern software systems necessitates robust autonomic self-management capabilities. While Large Language Models (LLMs) demonstrate potential in this domain, they often face challenges in adapting their general knowledge to specific service contexts. To address this limitation, we propose ServiceOdyssey, a self-learning agent system that autonomously manages microservic… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  19. arXiv:2501.18460  [pdf, other

    cs.SE

    ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

    Authors: Minghua He, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only learn the contextual semantics of code during pre-training, neglecting executability information closely related to the execution state of the code, which results… ▽ More

    Submitted 30 January, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

  20. arXiv:2501.16050  [pdf, other

    cs.SE cs.AI

    Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation

    Authors: Xing Zhang, Jiaheng Wen, Fangkai Yang, Pu Zhao, Yu Kang, Junhao Wang, Maoquan Wang, Yufan Huang, Elsie Nallipogu, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: The advancement of large language models has intensified the need to modernize enterprise applications and migrate legacy systems to secure, versatile languages. However, existing code translation benchmarks primarily focus on individual functions, overlooking the complexities involved in translating entire repositories, such as maintaining inter-module coherence and managing dependencies. While s… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  21. arXiv:2501.13699  [pdf, other

    cs.CL cs.SE

    DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale

    Authors: Linghao Zhang, Junhao Wang, Shilin He, Chaoyun Zhang, Yu Kang, Bowen Li, Jiaheng Wen, Chengxing Xie, Maoquan Wang, Yufan Huang, Elsie Nallipogu, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Large Language Models have advanced automated software development, however, it remains a challenge to correctly infer dependencies, namely, identifying the internal components and external packages required for a repository to successfully run. Existing studies highlight that dependency-related issues cause over 40\% of observed runtime errors on the generated repository. To address this, we intr… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  22. Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud Platforms

    Authors: Benjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novaković, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, Ricardo Bianchini

    Abstract: Cloud platforms remain underutilized despite multiple proposals to improve their utilization (e.g., disaggregation, harvesting, and oversubscription). Our characterization of the resource utilization of virtual machines (VMs) in Azure reveals that, while CPU is the main underutilized resource, we need to provide a solution to manage all resources holistically. We also observe that many VMs exhibit… ▽ More

    Submitted 19 March, 2025; v1 submitted 19 January, 2025; originally announced January 2025.

    Comments: To appear in 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS'25). 15 pages

  23. arXiv:2501.06706  [pdf, other

    cs.AI cs.DC cs.MA cs.SE

    AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

    Authors: Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, Saravan Rajmohan

    Abstract: AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis, to reduce human workload and minimize customer impact. While traditional DevOps tools and AIOps algorithms often focus on addressing isolated operational tasks, recent advances in Large Language Models (LLMs) and AI agents are revolutionizing AIOps by enabling end-to-end and… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

  24. arXiv:2412.17395  [pdf, other

    cs.CL

    WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models

    Authors: Huawen Feng, Pu Zhao, Qingfeng Sun, Can Xu, Fangkai Yang, Lu Wang, Qianli Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Despite recent progress achieved by code large language models (LLMs), their remarkable abilities are largely dependent on fine-tuning on the high-quality data, posing challenges for data collection and annotation. To address this, current methods often design various data flywheels to collect complex code instructions, enabling models to handle more intricate tasks. However, these approaches typi… ▽ More

    Submitted 18 February, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

  25. arXiv:2412.16378  [pdf, other

    cs.LG cs.AI cs.CL

    REFA: Reference Free Alignment for multi-preference optimization

    Authors: Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan

    Abstract: We introduce $\textbf{REFA}$, a family of reference-free alignment methods that optimize over multiple user preferences while enforcing fine-grained length control. Our approach integrates deviation-based weighting to emphasize high-quality responses, length normalization to prevent trivial short-response solutions, and an EOS-probability regularizer to mitigate dataset-induced brevity biases. The… ▽ More

    Submitted 24 February, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

  26. arXiv:2412.11077  [pdf, other

    cs.CV

    Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

    Authors: Yuanmin Tang, Xiaoting Qin, Jue Zhang, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Ling, Saravan Rajmohan, Dongmei Zhang, Qi Wu

    Abstract: Composed Image Retrieval (CIR) aims to retrieve target images that closely resemble a reference image while integrating user-specified textual modifications, thereby capturing user intent more precisely. Existing training-free zero-shot CIR (ZS-CIR) methods often employ a two-stage process: they first generate a caption for the reference image and then use Large Language Models for reasoning to ob… ▽ More

    Submitted 19 December, 2024; v1 submitted 15 December, 2024; originally announced December 2024.

  27. arXiv:2412.10047  [pdf, other

    cs.AI

    Large Action Models: From Inception to Implementation

    Authors: Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dy… ▽ More

    Submitted 13 January, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: 25pages,12 figures

  28. arXiv:2412.08585  [pdf, other

    cs.LG cs.AI cs.AR

    TurboAttention: Efficient Attention Approximation For High Throughputs LLMs

    Authors: Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruhle, Saravan Rajmohan

    Abstract: Large language model (LLM) inference demands significant amount of computation and memory, especially in the key attention mechanism. While techniques, such as quantization and acceleration algorithms, like FlashAttention, have improved efficiency of the overall inference, they address different aspects of the problem: quantization focuses on weight-activation operations, while FlashAttention impr… ▽ More

    Submitted 17 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

  29. arXiv:2412.04628  [pdf, other

    cs.LG cs.AI cs.CL

    SWEPO: Simultaneous Weighted Preference Optimization for Group Contrastive Alignment

    Authors: Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan

    Abstract: Direct Preference Optimization (DPO) has proven effective in aligning large language models with human preferences but is often constrained to pairwise comparisons -- overlooking additional positive and negative responses that are commonly available in real-world settings. We propose Simultaneous Weighted Preference Optimization (SWEPO), which incorporates multiple responses per query and prioriti… ▽ More

    Submitted 21 February, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

  30. arXiv:2411.18279  [pdf, other

    cs.AI cs.CL cs.HC

    Large Language Model-Brained GUI Agents: A Survey

    Authors: Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a n… ▽ More

    Submitted 6 May, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: The collection of papers reviewed in this survey will be hosted and regularly updated on the GitHub repository: https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey Additionally, a searchable webpage is available at https://aka.ms/gui-agent for easier access and exploration

  31. arXiv:2411.15997  [pdf, other

    cs.LG cs.AI cs.DC cs.MA

    Ensuring Fair LLM Serving Amid Diverse Applications

    Authors: Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renèe St. Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor Rühle, Chetan Bansal, Saravan Rajmohan

    Abstract: In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To addre… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

  32. arXiv:2411.08768  [pdf, other

    cs.CV cs.AI

    Sharingan: Extract User Action Sequence from Desktop Recordings

    Authors: Yanting Chen, Yi Ren, Xiaoting Qin, Jue Zhang, Kehong Yuan, Lu Han, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Video recordings of user activities, particularly desktop recordings, offer a rich source of data for understanding user behaviors and automating processes. However, despite advancements in Vision-Language Models (VLMs) and their increasing use in video analysis, extracting user actions from desktop recordings remains an underexplored area. This paper addresses this gap by proposing two novel VLM-… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  33. arXiv:2411.03349  [pdf, other

    cs.AI cs.CL cs.LG

    RuAG: Learned-rule-augmented Generation for Large Language Models

    Authors: Yudi Zhang, Pei Xiao, Lu Wang, Chaoyun Zhang, Meng Fang, Yali Du, Yevgeniy Puzyrev, Randolph Yao, Si Qin, Qingwei Lin, Mykola Pechenizkiy, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: In-context learning (ICL) and Retrieval-Augmented Generation (RAG) have gained attention for their ability to enhance LLMs' reasoning by incorporating external knowledge but suffer from limited contextual window size, leading to insufficient information injection. To this end, we propose a novel framework, RuAG, to automatically distill large volumes of offline data into interpretable first-order… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

  34. arXiv:2411.00722  [pdf, other

    cs.LG

    Token-level Proximal Policy Optimization for Query Generation

    Authors: Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang, Feng Sun, Qi Zhang

    Abstract: Query generation is a critical task for web search engines (e.g. Google, Bing) and recommendation systems. Recently, state-of-the-art query generation methods leverage Large Language Models (LLMs) for their strong capabilities in context understanding and text generation. However, they still face challenges in generating high-quality queries in terms of inferring user intent based on their web sea… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: 10 pages

  35. arXiv:2411.00418  [pdf, ps, other

    cs.CL cs.AI

    Self-Evolved Reward Learning for LLMs

    Authors: Chenghua Huang, Zhizhen Fan, Lu Wang, Fangkai Yang, Pu Zhao, Zeqi Lin, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is a crucial technique for aligning language models with human preferences, playing a pivotal role in the success of conversational models like GPT-4, ChatGPT, and Llama 2. A core challenge in employing RLHF lies in training a reliable reward model (RM), which relies on high-quality labels typically provided by human experts or advanced AI system.… ▽ More

    Submitted 3 June, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 23 pages,6 figures,Accepted to ICLR 2025

  36. arXiv:2410.24032  [pdf, other

    cs.HC cs.AI cs.CL

    Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks

    Authors: Yingzhe Peng, Xiaoting Qin, Zhiyang Zhang, Jue Zhang, Qingwei Lin, Xu Yang, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: The rise of large language models (LLMs) has revolutionized user interactions with knowledge-based systems, enabling chatbots to synthesize vast amounts of information and assist with complex, exploratory tasks. However, LLM-based chatbots often struggle to provide personalized support, particularly when users start with vague queries or lack sufficient contextual information. This paper introduce… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

  37. arXiv:2410.21545  [pdf, other

    cs.CL

    CARMO: Dynamic Criteria Generation for Context-Aware Reward Modelling

    Authors: Taneesh Gupta, Shivam Shandilya, Xuchao Zhang, Rahul Madhavan, Supriyo Ghosh, Chetan Bansal, Huaxiu Yao, Saravan Rajmohan

    Abstract: Reward modeling in large language models is susceptible to reward hacking, causing models to latch onto superficial features such as the tendency to generate lists or unnecessarily long responses. In reinforcement learning from human feedback (RLHF) and more generally during post-training flawed reward signals often lead to outputs that optimize for these spurious correlates instead of genuine qua… ▽ More

    Submitted 17 February, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  38. arXiv:2409.17642  [pdf, other

    cs.AI cs.CY

    AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure

    Authors: Xi Chen, Zhiyang Zhang, Fangkai Yang, Xiaoting Qin, Chao Du, Xi Cheng, Hangxin Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Large language model (LLM)-based AI delegates are increasingly utilized to act on behalf of users, assisting them with a wide range of tasks through conversational interfaces. Despite their advantages, concerns arise regarding the potential risk of privacy leaks, particularly in scenarios involving social interactions. While existing research has focused on protecting privacy by limiting the acces… ▽ More

    Submitted 7 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

  39. arXiv:2409.17140  [pdf, other

    cs.AI

    AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents

    Authors: Junting Lu, Zhiyang Zhang, Fangkai Yang, Jue Zhang, Lu Wang, Chao Du, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Multimodal large language models (MLLMs) have enabled LLM-based agents to directly interact with application user interfaces (UIs), enhancing agents' performance in complex tasks. However, these agents often suffer from high latency and low reliability due to the extensive sequential UI interactions. To address this issue, we propose AXIS, a novel LLM-based agents framework that prioritize actions… ▽ More

    Submitted 19 May, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

  40. arXiv:2408.13510  [pdf, other

    cs.DC eess.SY

    Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing

    Authors: Kunal Jain, Anjaly Parayil, Ankur Mallick, Esha Choukse, Xiaoting Qin, Jue Zhang, Íñigo Goiri, Rujia Wang, Chetan Bansal, Victor Rühle, Anoop Kulkarni, Steve Kofsky, Saravan Rajmohan

    Abstract: Large Language Model (LLM) workloads have distinct prefill and decode phases with different compute and memory requirements which should ideally be accounted for when scheduling input queries across different LLM instances in a cluster. However existing scheduling algorithms treat LLM workloads as monolithic jobs without considering the distinct characteristics of the two phases in each workload.… ▽ More

    Submitted 7 January, 2025; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: 16 pages, 10 figures

  41. arXiv:2408.04259  [pdf, other

    cs.CL cs.AI

    EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

    Authors: Ziyuan Zhuang, Zhiyang Zhang, Sitao Cheng, Fangkai Yang, Jia Liu, Shujian Huang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Retrieval-augmented generation (RAG) methods encounter difficulties when addressing complex questions like multi-hop queries. While iterative retrieval methods improve performance by gathering additional information, current approaches often rely on multiple calls of large language models (LLMs). In this paper, we introduce EfficientRAG, an efficient retriever for multi-hop question answering. Eff… ▽ More

    Submitted 26 September, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 20 pages, 4 figures

  42. arXiv:2408.00764  [pdf, other

    cs.CL cs.AI cs.LG

    AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

    Authors: Mengkang Hu, Pu Zhao, Can Xu, Qingfeng Sun, Jianguang Lou, Qingwei Lin, Ping Luo, Saravan Rajmohan

    Abstract: Large Language Model-based agents have garnered significant attention and are becoming increasingly popular. Furthermore, planning ability is a crucial component of an LLM-based agent, which generally entails achieving a desired goal from an initial state. This paper investigates enhancing the planning abilities of LLMs through instruction tuning, referred to as agent training. Recent studies have… ▽ More

    Submitted 6 February, 2025; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2025 (Research Track). Project page: https://agent-gen.github.io/

  43. arXiv:2407.14402  [pdf, other

    cs.AI cs.CL cs.DC cs.MA cs.SE

    The Vision of Autonomic Computing: Can LLMs Make It a Reality?

    Authors: Zhiyang Zhang, Fangkai Yang, Xiaoting Qin, Jue Zhang, Qingwei Lin, Gong Cheng, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: The Vision of Autonomic Computing (ACV), proposed over two decades ago, envisions computing systems that self-manage akin to biological organisms, adapting seamlessly to changing environments. Despite decades of research, achieving ACV remains challenging due to the dynamic and complex nature of modern computing systems. Recent advancements in Large Language Models (LLMs) offer promising solutions… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  44. arXiv:2407.12165  [pdf, other

    cs.SE cs.AI cs.DC

    Building AI Agents for Autonomous Clouds: Challenges and Design Principles

    Authors: Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, Saravan Rajmohan

    Abstract: The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using AI agents for operational resilience of cloud services, which currently require significant human effort and domain knowledge.… ▽ More

    Submitted 31 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  45. arXiv:2406.19251  [pdf, other

    cs.CL cs.AI

    AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

    Authors: Jia Fu, Xiaoting Qin, Fangkai Yang, Lu Wang, Jue Zhang, Qingwei Lin, Yubo Chen, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Recent advancements in Large Language Models have transformed ML/AI development, necessitating a reevaluation of AutoML principles for the Retrieval-Augmented Generation (RAG) systems. To address the challenges of hyper-parameter optimization and online adaptation in RAG, we propose the AutoRAG-HP framework, which formulates the hyper-parameter tuning as an online multi-armed bandit (MAB) problem… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  46. arXiv:2406.13372  [pdf, other

    cs.AI

    Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

    Authors: Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Shuzheng Si, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang, Baobao Chang

    Abstract: Recent advances in retrieval-augmented generation have significantly improved the performance of question-answering systems, particularly on factoid '5Ws' questions. However, these systems still face substantial challenges when addressing '1H' questions, specifically how-to questions, which are integral to decision-making processes and require dynamic, step-by-step answers. The key limitation lies… ▽ More

    Submitted 10 October, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Work in progress

  47. arXiv:2406.01047  [pdf, other

    cs.DC cs.AI cs.LG

    An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

    Authors: Hang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda

    Abstract: Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  48. arXiv:2405.15370  [pdf, other

    cs.CL

    Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection

    Authors: Jun Liu, Chaoyun Zhang, Jiaxu Qian, Minghua Ma, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Time series anomaly detection (TSAD) plays a crucial role in various industries by identifying atypical patterns that deviate from standard trends, thereby maintaining system integrity and enabling prompt response measures. Traditional TSAD models, which often rely on deep learning, require extensive training data and operate as black boxes, lacking interpretability for detected anomalies. To addr… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  49. arXiv:2405.10480  [pdf, other

    cs.AR cs.LG

    Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

    Authors: Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor Rühle, Saravan Rajmohan

    Abstract: Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching billions of parameters. These huge models are memory hungry and incur significant inference latency even on cutting edge AI-accelerators, such as GPUs. Specifica… ▽ More

    Submitted 14 January, 2025; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: 13 pages, 10 figures

    ACM Class: I.2.7; C.1.4

  50. arXiv:2404.19143  [pdf, other

    cs.DC

    Workload Intelligence: Punching Holes Through the Cloud Abstraction

    Authors: Lexiang Huang, Anjaly Parayil, Jue Zhang, Xiaoting Qin, Chetan Bansal, Jovan Stojkovic, Pantea Zardoshti, Pulkit Misra, Eli Cortez, Raphael Ghelman, Íñigo Goiri, Saravan Rajmohan, Jim Kleewein, Rodrigo Fonseca, Timothy Zhu, Ricardo Bianchini

    Abstract: Today, cloud workloads are essentially opaque to the cloud platform. Typically, the only information the platform receives is the virtual machine (VM) type and possibly a decoration to the type (e.g., the VM is evictable). Similarly, workloads receive little to no information from the platform; generally, workloads might receive telemetry from their VMs or exceptional signals (e.g., shortly before… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.