Skip to main content

Showing 1–3 of 3 results for author: Schroeder, L G

.
  1. arXiv:2502.08235  [pdf, other

    cs.AI

    The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

    Authors: Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez

    Abstract: Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon where models favor extended internal reasoning chains over environmental interaction. Through experiments on software engineering tasks using SWE Bench Verified, we observ… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  2. arXiv:2502.03771  [pdf, other

    cs.LG cs.CL

    vCache: Verified Semantic Prompt Caching

    Authors: Luis Gaspar Schroeder, Aditya Desai, Alejandro Cuadron, Kyle Chu, Shu Liu, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, Joseph E. Gonzalez

    Abstract: Semantic caches return cached LLM-generated responses for semantically similar prompts to reduce inference latency and cost. They embed cached prompts and store them alongside their response in a vector database. Embedding similarity metrics assign a numerical score to quantify the similarity between a request and its nearest neighbor prompt from the cache. Existing systems use the same static sim… ▽ More

    Submitted 27 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  3. arXiv:2403.05821  [pdf, other

    cs.LG cs.DB

    Optimizing LLM Queries in Relational Data Analytics Workloads

    Authors: Shu Liu, Asim Biswal, Amog Kamsetty, Audrey Cheng, Luis Gaspar Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph E. Gonzalez, Matei Zaharia

    Abstract: Batch data analytics is a growing application for Large Language Models (LLMs). LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets. However, LLM inference is highly costly and slow: for example, an NVIDIA L4 GPU running Llama3-8B can only process 6 KB of text per second, taking about a day to handle 1… ▽ More

    Submitted 9 April, 2025; v1 submitted 9 March, 2024; originally announced March 2024.