Skip to main content

Showing 1–19 of 19 results for author: Valmeekam, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.13775  [pdf, ps, other

    cs.LG cs.AI

    Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens

    Authors: Kaya Stechly, Karthik Valmeekam, Atharva Gundawar, Vardhan Palod, Subbarao Kambhampati

    Abstract: Recent impressive results from large reasoning models have been interpreted as a triumph of Chain of Thought (CoT), and especially of the process of training on CoTs sampled from base LLMs in order to help find new reasoning patterns. In this paper, we critically examine that interpretation by investigating how the semantics of intermediate tokens-often anthropomorphized as "thoughts" or reasoning… ▽ More

    Submitted 27 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  2. arXiv:2505.13697  [pdf, ps, other

    cs.LG cs.AI

    RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs

    Authors: Soumya Rani Samineni, Durgesh Kalwar, Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati

    Abstract: Reinforcement learning-based post-training of large language models (LLMs) has recently gained attention, particularly following the release of DeepSeek R1, which applied GRPO for fine-tuning. Amid the growing hype around improved reasoning abilities attributed to RL post-training, we critically examine the formulation and assumptions underlying these methods. We start by highlighting the popular… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  3. arXiv:2504.09762  [pdf, other

    cs.AI

    Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

    Authors: Subbarao Kambhampati, Kaya Stechly, Karthik Valmeekam, Lucas Saldyt, Siddhant Bhambri, Vardhan Palod, Atharva Gundawar, Soumya Rani Samineni, Durgesh Kalwar, Upasana Biswas

    Abstract: Intermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. These intermediate tokens have been called "reasoning traces" or even "thoughts" -- implicitly anthropomorphizing the model, implying these tokens resemble steps a human might take when solving a challenging problem.… ▽ More

    Submitted 27 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: This is a fork of v1. This fork, while overlapping with v1 in background section, differs both in the overall focus as well as the specific argument against anthropomorphization of reasoning traces

  4. arXiv:2411.14484  [pdf, other

    cs.CL cs.AI

    Robust Planning with Compound LLM Architectures: An LLM-Modulo Approach

    Authors: Atharva Gundawar, Karthik Valmeekam, Mudit Verma, Subbarao Kambhampati

    Abstract: Previous work has attempted to boost Large Language Model (LLM) performance on planning and scheduling tasks through a variety of prompt engineering techniques. While these methods can work within the distributions tested, they are neither robust nor predictable. This limitation can be addressed through compound LLM architectures where LLMs work in conjunction with other components to ensure relia… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  5. arXiv:2410.02162  [pdf, other

    cs.AI

    Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1

    Authors: Karthik Valmeekam, Kaya Stechly, Atharva Gundawar, Subbarao Kambhampati

    Abstract: The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities, but -- despite the slew of new… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: text overlap with arXiv:2409.13373

  6. arXiv:2409.13373  [pdf, other

    cs.AI cs.CL

    LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

    Authors: Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati

    Abstract: The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities. PlanBench, an extensible benchm… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  7. arXiv:2405.20625  [pdf, other

    cs.AI

    Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

    Authors: Atharva Gundawar, Mudit Verma, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati

    Abstract: As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  8. arXiv:2405.04776  [pdf, other

    cs.AI

    Chain of Thoughtlessness? An Analysis of CoT in Planning

    Authors: Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

    Abstract: Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated with chain of thought prompting-a method of demonstrating solution procedures-with the intuition that it is possible to in-context teach an LLM an algorithm for solving the problem. This paper presents a case study of chain of thought… ▽ More

    Submitted 12 March, 2025; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024

  9. arXiv:2402.08115  [pdf, other

    cs.AI

    On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks

    Authors: Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

    Abstract: There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples--ranging from multiplication to simple planning--there persists a wide spread belief that LLMs can self-critique and improve their own solutions in an itera… ▽ More

    Submitted 3 August, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.12397

  10. arXiv:2402.01817  [pdf, other

    cs.AI cs.LG

    LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

    Authors: Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

    Abstract: There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the probl… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  11. arXiv:2311.00226  [pdf, other

    eess.SP cs.LG

    Transformers are Provably Optimal In-context Estimators for Wireless Communications

    Authors: Vishnu Teja Kunde, Vicram Rajagopalan, Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Srinivas Shakkottai, Dileep Kalathil, Jean-Francois Chamberland

    Abstract: Pre-trained transformers exhibit the capability of adapting to new tasks through in-context learning (ICL), where they efficiently utilize a limited set of prompts without explicit model optimization. The canonical communication problem of estimating transmitted symbols from received observations can be modeled as an in-context learning problem: received observations are a noisy function of transm… ▽ More

    Submitted 11 March, 2025; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: Accepted at AISTATS 2025

  12. arXiv:2310.08118  [pdf, other

    cs.AI

    Can Large Language Models Really Improve by Self-critiquing Their Own Plans?

    Authors: Karthik Valmeekam, Matthew Marquez, Subbarao Kambhampati

    Abstract: There have been widespread claims about Large Language Models (LLMs) being able to successfully verify or self-critique their candidate solutions in reasoning problems in an iterative mode. Intrigued by those claims, in this paper we set out to investigate the verification/self-critiquing abilities of large language models in the context of planning. We evaluate a planning system that employs LLMs… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  13. arXiv:2306.04050  [pdf, ps, other

    cs.IT cs.CL cs.LG

    LLMZip: Lossless Text Compression using Large Language Models

    Authors: Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Dileep Kalathil, Jean-Francois Chamberland, Srinivas Shakkottai

    Abstract: We provide new estimates of an asymptotic upper bound on the entropy of English using the large language model LLaMA-7B as a predictor for the next token given a window of past tokens. This estimate is significantly smaller than currently available estimates in \cite{cover1978convergent}, \cite{lutati2023focus}. A natural byproduct is an algorithm for lossless compression of English text which com… ▽ More

    Submitted 26 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 7 pages, 4 figures, 4 tables, preprint, added results on using LLMs with arithmetic coding

  14. arXiv:2305.15771  [pdf, other

    cs.AI

    On the Planning Abilities of Large Language Models : A Critical Investigation

    Authors: Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati

    Abstract: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external plan… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 Spotlight. arXiv admin note: substantial text overlap with arXiv:2206.10498

  15. arXiv:2305.14909  [pdf, other

    cs.AI

    Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

    Authors: Lin Guan, Karthik Valmeekam, Sarath Sreedharan, Subbarao Kambhampati

    Abstract: There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this wor… ▽ More

    Submitted 1 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  16. arXiv:2302.06706  [pdf, other

    cs.AI cs.CL cs.LG

    On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

    Authors: Karthik Valmeekam, Sarath Sreedharan, Matthew Marquez, Alberto Olmo, Subbarao Kambhampati

    Abstract: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) how good LLMs are by themselves in generating and validating simple plans in commonsense planning tasks (of the type that humans are generally quite good at) and (2) how good LLMs are in being a source of heu… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.10498

  17. arXiv:2210.15906  [pdf, other

    cs.AI cs.HC cs.LG

    Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences

    Authors: Lin Guan, Karthik Valmeekam, Subbarao Kambhampati

    Abstract: Generating complex behaviors that satisfy the preferences of non-expert users is a crucial requirement for AI agents. Interactive reward learning from trajectory comparisons (a.k.a. RLHF) is one way to allow non-expert users to convey complex objectives by expressing preferences over short clips of agent behaviors. Even though this parametric method can encode complex tacit knowledge present in th… ▽ More

    Submitted 27 February, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: ICLR 2023 Camera Ready

  18. arXiv:2206.10498  [pdf, other

    cs.CL cs.AI

    PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

    Authors: Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati

    Abstract: Generating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning capabilities of large language models (LLMs) has become a hot topic of research. Most claims about LLM planning capabilities are however based on common sense tasks-where it becomes hard to tell whether LLMs are… ▽ More

    Submitted 25 November, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2023 Track on Datasets and Benchmarks

  19. arXiv:2011.09644  [pdf, other

    cs.AI

    RADAR-X: An Interactive Mixed Initiative Planning Interface Pairing Contrastive Explanations and Revised Plan Suggestions

    Authors: Karthik Valmeekam, Sarath Sreedharan, Sailik Sengupta, Subbarao Kambhampati

    Abstract: Decision support systems seek to enable informed decision-making. In the recent years, automated planning techniques have been leveraged to empower such systems to better aid the human-in-the-loop. The central idea for such decision support systems is to augment the capabilities of the human-in-the-loop with automated planning techniques and enhance the quality of decision-making. In addition to p… ▽ More

    Submitted 3 June, 2022; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: Accepted at ICAPS 2022