Skip to main content

Showing 1–5 of 5 results for author: Cuadron, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12886  [pdf, ps, other

    cs.CL

    ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality

    Authors: Adrián Cuadrón, Aimar Sagasti, Maitane Urruela, Iker De la Iglesia, Ane G Domingo-Aldama, Aitziber Atutxa, Josu Goikoetxea, Ander Barrena

    Abstract: This work presents three different approaches to address the ArchEHR-QA 2025 Shared Task on automated patient question answering. We introduce an end-to-end prompt-based baseline and two two-step methods to divide the task, without utilizing any external knowledge. Both two step approaches first extract essential sentences from the clinical text, by prompt or similarity ranking, and then generate… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted for publication in Proceedings of the 24th Workshop on Biomedical Natural Language Processing (BioNLP) at ACL 2025

  2. arXiv:2502.08235  [pdf, other

    cs.AI

    The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

    Authors: Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez

    Abstract: Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon where models favor extended internal reasoning chains over environmental interaction. Through experiments on software engineering tasks using SWE Bench Verified, we observ… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  3. arXiv:2502.03771  [pdf, other

    cs.LG cs.CL

    vCache: Verified Semantic Prompt Caching

    Authors: Luis Gaspar Schroeder, Aditya Desai, Alejandro Cuadron, Kyle Chu, Shu Liu, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, Joseph E. Gonzalez

    Abstract: Semantic caches return cached LLM-generated responses for semantically similar prompts to reduce inference latency and cost. They embed cached prompts and store them alongside their response in a vector database. Embedding similarity metrics assign a numerical score to quantify the similarity between a request and its nearest neighbor prompt from the cache. Existing systems use the same static sim… ▽ More

    Submitted 27 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  4. arXiv:2412.14468  [pdf, ps, other

    cs.LG cs.AI

    HashAttention: Semantic Sparsity for Faster Inference

    Authors: Aditya Desai, Shuo Yang, Alejandro Cuadron, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

    Abstract: Leveraging long contexts is crucial for advanced AI systems, but attention computation poses a scalability challenge. While scaled dot-product attention (SDPA) exhibits token sparsity, i.e. only a few pivotal tokens significantly contribute to output, exploiting this sparsity remains challenging. Existing methods either suffer from quality degradation or require substantial additional resources. W… ▽ More

    Submitted 3 June, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted at ICML'2025

  5. arXiv:2410.12784  [pdf, other

    cs.AI cs.CL cs.LG

    JudgeBench: A Benchmark for Evaluating LLM-based Judges

    Authors: Sijun Tan, Siyuan Zhuang, Kyle Montgomery, William Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, Ion Stoica

    Abstract: LLM-based judges have emerged as a scalable alternative to human evaluation and are increasingly used to assess, compare, and improve models. However, the reliability of LLM-based judges themselves is rarely scrutinized. As LLMs become more advanced, their responses grow more sophisticated, requiring stronger judges to evaluate them. Existing benchmarks primarily focus on a judge's alignment with… ▽ More

    Submitted 4 April, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: Published as a conference paper at ICLR 2025