-
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Authors:
Siddhant Bhambri,
Upasana Biswas,
Subbarao Kambhampati
Abstract:
Question Answering (QA) poses a challenging and critical problem, particularly in today's age of interactive dialogue systems such as ChatGPT, Perplexity, Microsoft Copilot, etc. where users demand both accuracy and transparency in the model's outputs. Since smaller language models (SLMs) are computationally more efficient but often under-perform compared to larger models, Knowledge Distillation (…
▽ More
Question Answering (QA) poses a challenging and critical problem, particularly in today's age of interactive dialogue systems such as ChatGPT, Perplexity, Microsoft Copilot, etc. where users demand both accuracy and transparency in the model's outputs. Since smaller language models (SLMs) are computationally more efficient but often under-perform compared to larger models, Knowledge Distillation (KD) methods allow for finetuning these smaller models to improve their final performance. Lately, the intermediate tokens or the so called `reasoning' traces produced by Chain-of-Thought (CoT) or by reasoning models such as DeepSeek R1 are used as a training signal for KD. However, these reasoning traces are often verbose and difficult to interpret or evaluate. In this work, we aim to address the challenge of evaluating the faithfulness of these reasoning traces and their correlation with the final performance. To this end, we employ a KD method leveraging rule-based problem decomposition. This approach allows us to break down complex queries into structured sub-problems, generating interpretable traces whose correctness can be readily evaluated, even at inference time. Specifically, we demonstrate this approach on Open Book QA, decomposing the problem into a Classification step and an Information Retrieval step, thereby simplifying trace evaluation. Our SFT experiments with correct and incorrect traces on the CoTemp QA, Microsoft Machine Reading Comprehension QA, and Facebook bAbI QA datasets reveal the striking finding that correct traces do not necessarily imply that the model outputs the correct final solution. Similarly, we find a low correlation between correct final solutions and intermediate trace correctness. These results challenge the implicit assumption behind utilizing reasoning traces for improving SLMs' final performance via KD.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
Authors:
Subbarao Kambhampati,
Kaya Stechly,
Karthik Valmeekam,
Lucas Saldyt,
Siddhant Bhambri,
Vardhan Palod,
Atharva Gundawar,
Soumya Rani Samineni,
Durgesh Kalwar,
Upasana Biswas
Abstract:
Intermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. These intermediate tokens have been called "reasoning traces" or even "thoughts" -- implicitly anthropomorphizing the model, implying these tokens resemble steps a human might take when solving a challenging problem.…
▽ More
Intermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. These intermediate tokens have been called "reasoning traces" or even "thoughts" -- implicitly anthropomorphizing the model, implying these tokens resemble steps a human might take when solving a challenging problem.In this paper, we present evidence that this anthropomorphization isn't a harmless metaphor, and instead is quite dangerous -- it confuses the nature of these models and how to use them effectively, and leads to questionable research.
△ Less
Submitted 27 May, 2025; v1 submitted 13 April, 2025;
originally announced April 2025.
-
Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming
Authors:
Upasana Biswas,
Vardhan Palod,
Siddhant Bhambri,
Subbarao Kambhampati
Abstract:
State-of-the-art methods for Human-AI Teaming and Zero-shot Cooperation focus on task completion, i.e., task rewards, as the sole evaluation metric while being agnostic to how the two agents work with each other. Furthermore, subjective user studies only offer limited insight into the quality of cooperation existing within the team. Specifically, we are interested in understanding the cooperative…
▽ More
State-of-the-art methods for Human-AI Teaming and Zero-shot Cooperation focus on task completion, i.e., task rewards, as the sole evaluation metric while being agnostic to how the two agents work with each other. Furthermore, subjective user studies only offer limited insight into the quality of cooperation existing within the team. Specifically, we are interested in understanding the cooperative behaviors arising within the team when trained agents are paired with humans -- a problem that has been overlooked by the existing literature. To formally address this problem, we propose the concept of constructive interdependence -- measuring how much agents rely on each other's actions to achieve the shared goal -- as a key metric for evaluating cooperation in human-agent teams. We interpret interdependence in terms of action interactions in a STRIPS formalism, and define metrics that allow us to assess the degree of reliance between the agents' actions. We pair state-of-the-art agents HAT with learned human models as well as human participants in a user study for the popular Overcooked domain, and evaluate the task reward and teaming performance for these human-agent teams. Our results demonstrate that although trained agents attain high task rewards, they fail to induce cooperative behavior, showing very low levels of interdependence across teams. Furthermore, our analysis reveals that teaming performance is not necessarily correlated with task reward, highlighting that task reward alone cannot reliably measure cooperation arising in a team.
△ Less
Submitted 1 June, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Softening the Impact of Collisions in Contention Resolution
Authors:
Umesh Biswas,
Trisha Chakraborty,
Maxwell Young
Abstract:
Contention resolution addresses the problem of coordinating access to a shared communication channel. Time is discretized into synchronized slots, and a packet can be sent in any slot. If no packet is sent, then the slot is empty; if a single packet is sent, then it is successful; and when multiple packets are sent at the same time, a collision occurs, resulting in the failure of the corresponding…
▽ More
Contention resolution addresses the problem of coordinating access to a shared communication channel. Time is discretized into synchronized slots, and a packet can be sent in any slot. If no packet is sent, then the slot is empty; if a single packet is sent, then it is successful; and when multiple packets are sent at the same time, a collision occurs, resulting in the failure of the corresponding transmissions. In each slot, every packet receives ternary channel feedback indicating whether the current slot is empty, successful, or a collision.
Much of the prior work on contention resolution has focused on optimizing the makespan, which is the number of slots required for all packets to succeed. However, in many modern systems, collisions are also costly in terms of the time they incur, which existing contention-resolution algorithms do not address.
In this paper, we design and analyze a randomized algorithm, Collision Aversion Backoff (CAB), that optimizes both the makespan and the collision cost. We consider the static case where an unknown $n\geq 2$ packets are initially present in the system, and each collision has a known cost $\mathcal{C}$, where $1 \leq \mathcal{C} \leq n^κ$ for a known constant $κ\geq 0$. With error probability polynomially small in $n$, CAB guarantees that all packets succeed with makespan and a total expected collision cost of $\tilde{O}(n\sqrt{\mathcal{C}})$. We give a lower bound for the class of fair algorithms: where, in each slot, every packet executing the fair algorithm sends with the same probability (and the probability may change from slot to slot). Our lower bound is asymptotically tight up to a $\texttt{poly}(\log n)$-factor for sufficiently large $\mathcal{C}$.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming
Authors:
Siddhant Bhambri,
Mudit Verma,
Upasana Biswas,
Anil Murthy,
Subbarao Kambhampati
Abstract:
Preference-based Reinforcement Learning (PbRL) has made significant strides in single-agent settings, but has not been studied for multi-agent frameworks. On the other hand, modeling cooperation between multiple agents, specifically, Human-AI Teaming settings while ensuring successful task completion is a challenging problem. To this end, we perform the first investigation of multi-agent PbRL by e…
▽ More
Preference-based Reinforcement Learning (PbRL) has made significant strides in single-agent settings, but has not been studied for multi-agent frameworks. On the other hand, modeling cooperation between multiple agents, specifically, Human-AI Teaming settings while ensuring successful task completion is a challenging problem. To this end, we perform the first investigation of multi-agent PbRL by extending single-agent PbRL to the two-agent teaming settings and formulate it as a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior. Under this game formulation, we first introduce the notion of Human Flexibility to evaluate team performance based on if humans prefer to follow a fixed policy or adapt to the RL agent on the fly. Secondly, we study the RL agent's varying access to the human policy. We highlight a special case along these two dimensions, which we call Specified Orchestration, where the human is least flexible and agent has complete access to human policy. We motivate the need for taking Human Flexibility into account and the usefulness of Specified Orchestration through a gamified user study. We evaluate state-of-the-art PbRL algorithms for Human-AI cooperative setups through robot locomotion based domains that explicitly require forced cooperation. Our findings highlight the challenges associated with PbRL by varying Human Flexibility and agent's access to the human policy. Finally, we draw insights from our user study and empirical results, and conclude that Specified Orchestration can be seen as an upper bound PbRL performance for future research in Human-AI teaming scenarios.
△ Less
Submitted 24 September, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Minimization of Handoff latency by co-ordinate evaluation method using GPS based map
Authors:
Debabrata Sarddar,
Joydeep Banerjee,
Souvik Kumar Saha,
Tapas Jana,
Utpal Biswas,
M. K. Naskar
Abstract:
Handoff has become an essential criterion in mobile communication system, specially in urban areas, owing to the limited coverage area of Access Points (AP). Handover of calls between two Base Stations (BSs) is encountered frequently and it is essentially required to minimize the delay of the process. Many solutions attempting to improve this process have been proposed but only a few use geo-locat…
▽ More
Handoff has become an essential criterion in mobile communication system, specially in urban areas, owing to the limited coverage area of Access Points (AP). Handover of calls between two Base Stations (BSs) is encountered frequently and it is essentially required to minimize the delay of the process. Many solutions attempting to improve this process have been proposed but only a few use geo-location systems in the management of the handover. Here we propose to minimize the handoff latency by minimizing the number of APs scanned by the Mobile Node (MN) during each handoff procedure. We consider the whole topographical area as a two dimensional plane. By GPS, we can note down the co-ordinates of the MN at any instant. The average rate of change of its latitudinal distance and longitudinal distance with a specific time period is evaluated at the end of the given time period. With the knowledge of the given parameter, it is possible to determine the latitude and longitude of the MN after a particular instant of time. Hence, the direction of motion of the MN can be determined, which in turns gives the AP towards which the MN is headings. This reduces the number of APs to be scanned. Thus, on an overall basis, the handoff latency can be reduced by almost half to one third of its value.
△ Less
Submitted 10 August, 2010;
originally announced August 2010.
-
Minimization of Handoff Failure Probability for Next-Generation Wireless Systems
Authors:
Debabrata Sarddar,
Tapas Jana,
Souvik Kumar Saha,
Joydeep Banerjee,
Utpal Biswas,
M. K. Naskar
Abstract:
During the past few years, advances in mobile communication theory have enabled the development and deployment of different wireless technologies, complementary to each other. Hence, their integration can realize a unified wireless system that has the best features of the individual networks. Next-Generation Wireless Systems (NGWS) integrate different wireless systems, each of which is optimized f…
▽ More
During the past few years, advances in mobile communication theory have enabled the development and deployment of different wireless technologies, complementary to each other. Hence, their integration can realize a unified wireless system that has the best features of the individual networks. Next-Generation Wireless Systems (NGWS) integrate different wireless systems, each of which is optimized for some specific services and coverage area to provide ubiquitous communications to the mobile users. In this paper, we propose to enhance the handoff performance of mobile IP in wireless IP networks by reducing the false handoff probability in the NGWS handoff management protocol. Based on the information of false handoff probability, we analyze its effect on mobile speed and handoff signaling delay.
△ Less
Submitted 17 June, 2010;
originally announced June 2010.