Skip to main content

Showing 1–9 of 9 results for author: Stechly, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.13775  [pdf, ps, other

    cs.LG cs.AI

    Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens

    Authors: Kaya Stechly, Karthik Valmeekam, Atharva Gundawar, Vardhan Palod, Subbarao Kambhampati

    Abstract: Recent impressive results from large reasoning models have been interpreted as a triumph of Chain of Thought (CoT), and especially of the process of training on CoTs sampled from base LLMs in order to help find new reasoning patterns. In this paper, we critically examine that interpretation by investigating how the semantics of intermediate tokens-often anthropomorphized as "thoughts" or reasoning… ▽ More

    Submitted 27 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  2. arXiv:2505.13697  [pdf, ps, other

    cs.LG cs.AI

    RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs

    Authors: Soumya Rani Samineni, Durgesh Kalwar, Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati

    Abstract: Reinforcement learning-based post-training of large language models (LLMs) has recently gained attention, particularly following the release of DeepSeek R1, which applied GRPO for fine-tuning. Amid the growing hype around improved reasoning abilities attributed to RL post-training, we critically examine the formulation and assumptions underlying these methods. We start by highlighting the popular… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  3. arXiv:2504.09762  [pdf, other

    cs.AI

    Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

    Authors: Subbarao Kambhampati, Kaya Stechly, Karthik Valmeekam, Lucas Saldyt, Siddhant Bhambri, Vardhan Palod, Atharva Gundawar, Soumya Rani Samineni, Durgesh Kalwar, Upasana Biswas

    Abstract: Intermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. These intermediate tokens have been called "reasoning traces" or even "thoughts" -- implicitly anthropomorphizing the model, implying these tokens resemble steps a human might take when solving a challenging problem.… ▽ More

    Submitted 27 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: This is a fork of v1. This fork, while overlapping with v1 in background section, differs both in the overall focus as well as the specific argument against anthropomorphization of reasoning traces

  4. arXiv:2410.02162  [pdf, other

    cs.AI

    Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1

    Authors: Karthik Valmeekam, Kaya Stechly, Atharva Gundawar, Subbarao Kambhampati

    Abstract: The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities, but -- despite the slew of new… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: text overlap with arXiv:2409.13373

  5. arXiv:2409.13373  [pdf, other

    cs.AI cs.CL

    LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

    Authors: Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati

    Abstract: The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities. PlanBench, an extensible benchm… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  6. arXiv:2405.04776  [pdf, other

    cs.AI

    Chain of Thoughtlessness? An Analysis of CoT in Planning

    Authors: Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

    Abstract: Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated with chain of thought prompting-a method of demonstrating solution procedures-with the intuition that it is possible to in-context teach an LLM an algorithm for solving the problem. This paper presents a case study of chain of thought… ▽ More

    Submitted 12 March, 2025; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024

  7. arXiv:2402.08115  [pdf, other

    cs.AI

    On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks

    Authors: Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

    Abstract: There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples--ranging from multiplication to simple planning--there persists a wide spread belief that LLMs can self-critique and improve their own solutions in an itera… ▽ More

    Submitted 3 August, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.12397

  8. arXiv:2402.01817  [pdf, other

    cs.AI cs.LG

    LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

    Authors: Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

    Abstract: There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the probl… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  9. arXiv:2310.12397  [pdf, other

    cs.AI

    GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems

    Authors: Kaya Stechly, Matthew Marquez, Subbarao Kambhampati

    Abstract: There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples, a wide spread belief in their iterative self-critique capabilities persists. In this paper, we set out to systematically investigate the effectiveness of i… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 18 pages, 3 figures