-
HEART: Emotionally-driven test-time scaling of Language Models
Authors:
Gabriela Pinto,
Palash Goyal,
Yiwen Song,
Souradip Chakraborty,
Zifeng Wang,
Tomas Pfister,
Hamid Palangi
Abstract:
Test-time scaling has shown considerable success in improving the performance of language models on complex reasoning tasks without requiring fine-tuning. However, current strategies such as self-reflection primarily focus on logical or structural refinement. They do not leverage the guiding potential of affective feedback. Inspired by psychological research showing that emotions can modulate cogn…
▽ More
Test-time scaling has shown considerable success in improving the performance of language models on complex reasoning tasks without requiring fine-tuning. However, current strategies such as self-reflection primarily focus on logical or structural refinement. They do not leverage the guiding potential of affective feedback. Inspired by psychological research showing that emotions can modulate cognitive performance, we introduce HEART--a novel framework that uses emotionally-driven prompts for iterative self-correction. HEART provides feedback on a model's incorrect response using a curated set of concise, emotionally charged phrases based on the six universal emotions categorized by Dr. Paul Ekman. By systematically varying the emotional tone of the feedback across iterations, our method guides the model to escape flawed reasoning paths and explore more promising alternatives. We evaluate our framework on challenging reasoning benchmarks including OlympiadBench, Humanity's Last Exam, and SimpleQA. Our results reveal a significant new phenomenon: when guided by an oracle verifier, this affective iteration protocol unlocks significantly deeper reasoning, leading to consistent and substantial increases in accuracy over state-of-the-art baselines with the same verifier. However, we also identify a critical bottleneck for practical deployment. In a verifier-free setting, it struggles to harness these gains consistently, highlighting as a key challenge for future work. Our findings suggest that the next frontier in machine reasoning may lie not just in refining logic, but also in understanding and leveraging the `HEART' of the models.
△ Less
Submitted 26 September, 2025;
originally announced September 2025.
-
Lost in Transition: The Struggle of Women Returning to Software Engineering Research after Career Breaks
Authors:
Shalini Chakraborty,
Sebastian Baltes
Abstract:
The IT industry provides supportive pathways such as returnship programs, coding boot camps, and buddy systems for women re-entering their job after a career break. Academia, however, offers limited opportunities to motivate women to return. We propose a diverse multicultural research project investigating the challenges faced by women with software engineering (SE) backgrounds re-entering academi…
▽ More
The IT industry provides supportive pathways such as returnship programs, coding boot camps, and buddy systems for women re-entering their job after a career break. Academia, however, offers limited opportunities to motivate women to return. We propose a diverse multicultural research project investigating the challenges faced by women with software engineering (SE) backgrounds re-entering academia or related research roles after a career break. Career disruptions due to pregnancy, immigration status, or lack of flexible work options can significantly impact women's career progress, creating barriers for returning as lecturers, professors, or senior researchers. Although many companies promote gender diversity policies, such measures are less prominent and often under-recognized within academic institutions. Our goal is to explore the specific challenges women encounter when re-entering academic roles compared to industry roles; to understand the institutional perspective, including a comparative analysis of existing policies and opportunities in different countries for women to return to the field; and finally, to provide recommendations that support transparent hiring practices. The research project will be carried out in multiple universities and in multiple countries to capture the diverse challenges and policies that vary by location.
△ Less
Submitted 25 September, 2025;
originally announced September 2025.
-
Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity Modelling
Authors:
Sydney Anuyah,
Mehedi Mahmud Kaushik,
Krishna Dwarampudi,
Rakesh Shiradkar,
Arjan Durresi,
Sunandan Chakraborty
Abstract:
We introduce CoDe-KG, an open-source, end-to-end pipeline for extracting sentence-level knowledge graphs by combining robust coreference resolution with syntactic sentence decomposition. Using our model, we contribute a dataset of over 150,000 knowledge triples, which is open source. We also contribute a training corpus of 7248 rows for sentence complexity, 190 rows of gold human annotations for c…
▽ More
We introduce CoDe-KG, an open-source, end-to-end pipeline for extracting sentence-level knowledge graphs by combining robust coreference resolution with syntactic sentence decomposition. Using our model, we contribute a dataset of over 150,000 knowledge triples, which is open source. We also contribute a training corpus of 7248 rows for sentence complexity, 190 rows of gold human annotations for co-reference resolution using open source lung-cancer abstracts from PubMed, 900 rows of gold human annotations for sentence conversion policies, and 398 triples of gold human annotations. We systematically select optimal prompt-model pairs across five complexity categories, showing that hybrid chain-of-thought and few-shot prompting yields up to 99.8% exact-match accuracy on sentence simplification. On relation extraction (RE), our pipeline achieves 65.8% macro-F1 on REBEL, an 8-point gain over the prior state of the art, and 75.7% micro-F1 on WebNLG2, while matching or exceeding performance on Wiki-NRE and CaRB. Ablation studies demonstrate that integrating coreference and decomposition increases recall on rare relations by over 20%. Code and dataset are available at https://github.com/KaushikMahmud/CoDe-KG_EMNLP_2025
△ Less
Submitted 21 September, 2025;
originally announced September 2025.
-
Mind the Ethics! The Overlooked Ethical Dimensions of GenAI in Software Modeling Education
Authors:
Shalini Chakraborty,
Lola Burgueño,
Nathalie Moreno,
Javier Troya,
Paula Muñoz
Abstract:
Generative Artificial Intelligence (GenAI) is rapidly gaining momentum in software modeling education, embraced by both students and educators. As GenAI assists with interpreting requirements, formalizing models, and translating students' mental models into structured notations, it increasingly shapes core learning outcomes such as domain comprehension, diagrammatic thinking, and modeling fluency…
▽ More
Generative Artificial Intelligence (GenAI) is rapidly gaining momentum in software modeling education, embraced by both students and educators. As GenAI assists with interpreting requirements, formalizing models, and translating students' mental models into structured notations, it increasingly shapes core learning outcomes such as domain comprehension, diagrammatic thinking, and modeling fluency without clear ethical oversight or pedagogical guidelines. Yet, the ethical implications of this integration remain underexplored.
In this paper, we conduct a systematic literature review across six major digital libraries in computer science (ACM Digital Library, IEEE Xplore, Scopus, ScienceDirect, SpringerLink, and Web of Science). Our aim is to identify studies discussing the ethical aspects of GenAI in software modeling education, including responsibility, fairness, transparency, diversity, and inclusion among others.
Out of 1,386 unique papers initially retrieved, only three explicitly addressed ethical considerations. This scarcity highlights the critical absence of ethical discourse surrounding GenAI in modeling education and raises urgent questions about the responsible integration of AI in modeling curricula, as well as it evinces the pressing need for structured ethical frameworks in this emerging educational landscape. We examine these three studies and explore the emerging research opportunities as well as the challenges that have arisen in this field.
△ Less
Submitted 17 September, 2025;
originally announced September 2025.
-
No Infinite $(p,q)$-Theorem for Piercing Compact Convex Sets with Lines in $\mathbb{R}^3$
Authors:
Sutanoya Chakraborty,
Arijit Ghosh
Abstract:
An infinite $(p,q)$-theorem, or an $(\aleph_0,q)$-theorem, involving two families $\mathcal{F}$ and $\mathcal{G}$ of sets, states that if in every infinite subset of $\mathcal{F}$, there are $q$ sets that are intersected by some set in $\mathcal{G}$, then there is a finite set $S_{\mathcal{F}}\subseteq\mathcal{G}$ such that for every $C\in\mathcal{F}$, there is a $B\in S_{\mathcal{F}}$ with…
▽ More
An infinite $(p,q)$-theorem, or an $(\aleph_0,q)$-theorem, involving two families $\mathcal{F}$ and $\mathcal{G}$ of sets, states that if in every infinite subset of $\mathcal{F}$, there are $q$ sets that are intersected by some set in $\mathcal{G}$, then there is a finite set $S_{\mathcal{F}}\subseteq\mathcal{G}$ such that for every $C\in\mathcal{F}$, there is a $B\in S_{\mathcal{F}}$ with $C\cap B\neq\emptyset$. We provide an example demonstrating that there is no $(\aleph_0,q)$-theorem for piercing compact convex sets in $\mathbb{R}^3$ with lines by constructing a family $\mathcal{F}$ of compact convex sets such that it does not have a finite line transversal, but for any $t\in\mathbb{N}$, every infinite subset of $\mathcal{F}$ contains $t$ sets that are pierced by a line.
△ Less
Submitted 11 September, 2025; v1 submitted 8 September, 2025;
originally announced September 2025.
-
Enhancing Diversity in Large Language Models via Determinantal Point Processes
Authors:
Yilei Chen,
Souradip Chakraborty,
Lorenz Wolf,
Ioannis Ch. Paschalidis,
Aldo Pacchiano
Abstract:
Supervised fine-tuning and reinforcement learning are two popular methods for post-training large language models (LLMs). While improving the model's performance on downstream tasks, they often reduce the model's output diversity, leading to narrow, canonical responses. Existing methods to enhance diversity are limited, either by operating at inference time or by focusing on lexical differences. W…
▽ More
Supervised fine-tuning and reinforcement learning are two popular methods for post-training large language models (LLMs). While improving the model's performance on downstream tasks, they often reduce the model's output diversity, leading to narrow, canonical responses. Existing methods to enhance diversity are limited, either by operating at inference time or by focusing on lexical differences. We propose a novel training method named DQO based on determinantal point processes (DPPs) to jointly optimize LLMs for quality and semantic diversity. Our approach samples and embeds a group of responses for each prompt, then uses the determinant of a kernel-based similarity matrix to measure diversity as the volume spanned by the embeddings of these responses. Experiments across instruction-following, summarization, story generation, and reasoning tasks demonstrate that our method substantially improves semantic diversity without sacrificing model quality.
△ Less
Submitted 4 September, 2025;
originally announced September 2025.
-
Dimension Agnostic Testing of Survey Data Credibility through the Lens of Regression
Authors:
Debabrota Basu,
Sourav Chakraborty,
Debarshi Chanda,
Buddha Dev Das,
Arijit Ghosh,
Arnab Ray
Abstract:
Assessing whether a sample survey credibly represents the population is a critical question for ensuring the validity of downstream research. Generally, this problem reduces to estimating the distance between two high-dimensional distributions, which typically requires a number of samples that grows exponentially with the dimension. However, depending on the model used for data analysis, the concl…
▽ More
Assessing whether a sample survey credibly represents the population is a critical question for ensuring the validity of downstream research. Generally, this problem reduces to estimating the distance between two high-dimensional distributions, which typically requires a number of samples that grows exponentially with the dimension. However, depending on the model used for data analysis, the conclusions drawn from the data may remain consistent across different underlying distributions. In this context, we propose a task-based approach to assess the credibility of sampled surveys. Specifically, we introduce a model-specific distance metric to quantify this notion of credibility. We also design an algorithm to verify the credibility of survey data in the context of regression models. Notably, the sample complexity of our algorithm is independent of the data dimension. This efficiency stems from the fact that the algorithm focuses on verifying the credibility of the survey data rather than reconstructing the underlying regression model. Furthermore, we show that if one attempts to verify credibility by reconstructing the regression model, the sample complexity scales linearly with the dimensionality of the data. We prove the theoretical correctness of our algorithm and numerically demonstrate our algorithm's performance.
△ Less
Submitted 28 August, 2025;
originally announced August 2025.
-
Incentivized Lipschitz Bandits
Authors:
Sourav Chakraborty,
Amit Kiran Rege,
Claire Monteleoni,
Lijun Chen
Abstract:
We study incentivized exploration in multi-armed bandit (MAB) settings with infinitely many arms modeled as elements in continuous metric spaces. Unlike classical bandit models, we consider scenarios where the decision-maker (principal) incentivizes myopic agents to explore beyond their greedy choices through compensation, but with the complication of reward drift--biased feedback arising due to t…
▽ More
We study incentivized exploration in multi-armed bandit (MAB) settings with infinitely many arms modeled as elements in continuous metric spaces. Unlike classical bandit models, we consider scenarios where the decision-maker (principal) incentivizes myopic agents to explore beyond their greedy choices through compensation, but with the complication of reward drift--biased feedback arising due to the incentives. We propose novel incentivized exploration algorithms that discretize the infinite arm space uniformly and demonstrate that these algorithms simultaneously achieve sublinear cumulative regret and sublinear total compensation. Specifically, we derive regret and compensation bounds of $\Tilde{O}(T^{d+1/d+2})$, with $d$ representing the covering dimension of the metric space. Furthermore, we generalize our results to contextual bandits, achieving comparable performance guarantees. We validate our theoretical findings through numerical simulations.
△ Less
Submitted 26 August, 2025;
originally announced August 2025.
-
Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models
Authors:
Aniruddha Joshi,
Supratik Chakraborty,
S Akshay,
Shetal Shah,
Hazem Torfah,
Sanjit Seshia
Abstract:
Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Paret…
▽ More
Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto-optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpretations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal candidates with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.
△ Less
Submitted 21 August, 2025;
originally announced August 2025.
-
Trust@Health: A Trust-Based Multilayered Network for Scalable Healthcare Service Management
Authors:
Avijit Gayen,
Somyajit Chakraborty,
Joydeep Chakraborty,
Angshuman Jana
Abstract:
We study the intricate relationships within healthcare systems, focusing on interactions among doctors, departments, and hospitals. Leveraging an evolutionary graph framework, the proposed model emphasizes both intra-layer and inter-layer trust relationships to better understand and optimize healthcare services. The trust-based network facilitates the identification of key healthcare entities by i…
▽ More
We study the intricate relationships within healthcare systems, focusing on interactions among doctors, departments, and hospitals. Leveraging an evolutionary graph framework, the proposed model emphasizes both intra-layer and inter-layer trust relationships to better understand and optimize healthcare services. The trust-based network facilitates the identification of key healthcare entities by integrating their social and professional interactions, culminating in a trust-based algorithm that quantifies the importance of these entities. Validation with a real-world dataset reveals a strong correlation (0.91) between the proposed trust measures and the ratings of hospitals and departments, though doctor ratings demonstrate skewed distributions due to potential biases. By modeling these relationships and trust dynamics, the framework supports scalable healthcare infrastructure, enabling effective patient referrals, personalized recommendations, and enhanced decision-making pathways.
△ Less
Submitted 16 August, 2025;
originally announced August 2025.
-
Higher and extended Jacobi polynomials for codes
Authors:
Himadri Shekhar Chakraborty,
Tsuyoshi Miezaki
Abstract:
In this paper, we introduce Jacobi polynomial generalizations of several classical invariants in coding theory over finite fields, specifically, the higher and extended weight enumerators, and we establish explicit correspondences between the resulting Jacobi polynomials. Moreover, we present the Jacobi analogue of MacWilliams identity for both higher and extended weight enumerators. We also prese…
▽ More
In this paper, we introduce Jacobi polynomial generalizations of several classical invariants in coding theory over finite fields, specifically, the higher and extended weight enumerators, and we establish explicit correspondences between the resulting Jacobi polynomials. Moreover, we present the Jacobi analogue of MacWilliams identity for both higher and extended weight enumerators. We also present that the higher Jacobi polynomials for linear codes whose subcode supports form $t$-designs can be uniquely determined from the higher weight enumerators of the codes via polarization technique. Finally, we demonstrate how higher Jacobi polynomials can be computed from harmonic higher weight enumerators with the help of Hahn polynomials.
△ Less
Submitted 16 August, 2025;
originally announced August 2025.
-
Physics- and geometry-aware spatio-spectral graph neural operator for time-independent and time-dependent PDEs
Authors:
Subhankar Sarkar,
Souvik Chakraborty
Abstract:
Solving partial differential equations (PDEs) efficiently and accurately remains a cornerstone challenge in science and engineering, especially for problems involving complex geometries and limited labeled data. We introduce a Physics- and Geometry- Aware Spatio-Spectral Graph Neural Operator ($π$G-Sp$^2$GNO) for learning the solution operators of time-independent and time-dependent PDEs. The prop…
▽ More
Solving partial differential equations (PDEs) efficiently and accurately remains a cornerstone challenge in science and engineering, especially for problems involving complex geometries and limited labeled data. We introduce a Physics- and Geometry- Aware Spatio-Spectral Graph Neural Operator ($π$G-Sp$^2$GNO) for learning the solution operators of time-independent and time-dependent PDEs. The proposed approach first improves upon the recently developed Sp$^2$GNO by enabling geometry awareness and subsequently exploits the governing physics to learn the underlying solution operator in a simulation-free setup. While the spatio-spectral structure present in the proposed architecture allows multiscale learning, two separate strategies for enabling geometry awareness is introduced in this paper. For time dependent problems, we also introduce a novel hybrid physics informed loss function that combines higher-order time-marching scheme with upscaled theory inspired stochastic projection scheme. This allows accurate integration of the physics-information into the loss function. The performance of the proposed approach is illustrated on number of benchmark examples involving regular and complex domains, variation in geometry during inference, and time-independent and time-dependent problems. The results obtained illustrate the efficacy of the proposed approach as compared to the state-of-the-art physics-informed neural operator algorithms in the literature.
△ Less
Submitted 13 August, 2025;
originally announced August 2025.
-
Scalable h-adaptive probabilistic solver for time-independent and time-dependent systems
Authors:
Akshay Thakur,
Sawan Kumar,
Matthew Zahr,
Souvik Chakraborty
Abstract:
Solving partial differential equations (PDEs) within the framework of probabilistic numerics offers a principled approach to quantifying epistemic uncertainty arising from discretization. By leveraging Gaussian process regression and imposing the governing PDE as a constraint at a finite set of collocation points, probabilistic numerics delivers mesh-free solutions at arbitrary locations. However,…
▽ More
Solving partial differential equations (PDEs) within the framework of probabilistic numerics offers a principled approach to quantifying epistemic uncertainty arising from discretization. By leveraging Gaussian process regression and imposing the governing PDE as a constraint at a finite set of collocation points, probabilistic numerics delivers mesh-free solutions at arbitrary locations. However, the high computational cost, which scales cubically with the number of collocation points, remains a critical bottleneck, particularly for large-scale or high-dimensional problems. We propose a scalable enhancement to this paradigm through two key innovations. First, we develop a stochastic dual descent algorithm that reduces the per-iteration complexity from cubic to linear in the number of collocation points, enabling tractable inference. Second, we exploit a clustering-based active learning strategy that adaptively selects collocation points to maximize information gain while minimizing computational expense. Together, these contributions result in an $h$-adaptive probabilistic solver that can scale to a large number of collocation points. We demonstrate the efficacy of the proposed solver on benchmark PDEs, including two- and three-dimensional steady-state elliptic problems, as well as a time-dependent parabolic PDE formulated in a space-time setting.
△ Less
Submitted 14 August, 2025; v1 submitted 13 August, 2025;
originally announced August 2025.
-
MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization
Authors:
Ankan Deria,
Dwarikanath Mahapatra,
Behzad Bozorgtabar,
Mohna Chakraborty,
Snehashis Chakraborty,
Sudipta Roy
Abstract:
Virtual try-on seeks to generate photorealistic images of individuals in desired garments, a task that must simultaneously preserve personal identity and garment fidelity for practical use in fashion retail and personalization. However, existing methods typically handle upper and lower garments separately, rely on heavy preprocessing, and often fail to preserve person-specific cues such as tattoos…
▽ More
Virtual try-on seeks to generate photorealistic images of individuals in desired garments, a task that must simultaneously preserve personal identity and garment fidelity for practical use in fashion retail and personalization. However, existing methods typically handle upper and lower garments separately, rely on heavy preprocessing, and often fail to preserve person-specific cues such as tattoos, accessories, and body shape-resulting in limited realism and flexibility. To this end, we introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space. Specifically, we proposed three key modules: the Garment Representation Module (GRM) for capturing both garment semantics, the Person Representation Module (PRM) for encoding identity and pose cues, and the A-DiT fusion module, which integrates garment, person, and text-prompt features through a diffusion transformer. This architecture supports prompt-based customization, allowing fine-grained garment modifications with minimal user input. Extensive experiments on the VITON-HD and DressCode benchmarks demonstrate that MuGa-VTON outperforms existing methods in both qualitative and quantitative evaluations, producing high-fidelity, identity-preserving results suitable for real-world virtual try-on applications.
△ Less
Submitted 11 August, 2025;
originally announced August 2025.
-
Benchmarking Federated Learning for Throughput Prediction in 5G Live Streaming Applications
Authors:
Yuvraj Dutta,
Soumyajit Chatterjee,
Sandip Chakraborty,
Basabdatta Palit
Abstract:
Accurate and adaptive network throughput prediction is essential for latency-sensitive and bandwidth-intensive applications in 5G and emerging 6G networks. However, most existing methods rely on centralized training with uniformly collected data, limiting their applicability in heterogeneous mobile environments with non-IID data distributions. This paper presents the first comprehensive benchmarki…
▽ More
Accurate and adaptive network throughput prediction is essential for latency-sensitive and bandwidth-intensive applications in 5G and emerging 6G networks. However, most existing methods rely on centralized training with uniformly collected data, limiting their applicability in heterogeneous mobile environments with non-IID data distributions. This paper presents the first comprehensive benchmarking of federated learning (FL) strategies for throughput prediction in realistic 5G edge scenarios. We evaluate three aggregation algorithms - FedAvg, FedProx, and FedBN - across four time-series architectures: LSTM, CNN, CNN+LSTM, and Transformer, using five diverse real-world datasets. We systematically analyze the effects of client heterogeneity, cohort size, and history window length on prediction performance. Our results reveal key trade-offs among model complexities, convergence rates, and generalization. It is found that FedBN consistently delivers robust performance under non-IID conditions. On the other hand, LSTM and Transformer models outperform CNN-based baselines by up to 80% in R2 scores. Moreover, although Transformers converge in half the rounds of LSTM, they require longer history windows to achieve a high R2, indicating higher context dependence. LSTM is, therefore, found to achieve a favorable balance between accuracy, rounds, and temporal footprint. To validate the end-to-end applicability of the framework, we have integrated our FL-based predictors into a live adaptive streaming pipeline. It is seen that FedBN-based LSTM and Transformer models improve mean QoE scores by 11.7% and 11.4%, respectively, over FedAvg, while also reducing the variance. These findings offer actionable insights for building scalable, privacy-preserving, and edge-aware throughput prediction systems in next-generation wireless networks.
△ Less
Submitted 11 August, 2025;
originally announced August 2025.
-
On Understanding of the Dynamics of Model Capacity in Continual Learning
Authors:
Supriyo Chakraborty,
Krishnan Raghavan
Abstract:
The stability-plasticity dilemma, closely related to a neural network's (NN) capacity-its ability to represent tasks-is a fundamental challenge in continual learning (CL). Within this context, we introduce CL's effective model capacity (CLEMC) that characterizes the dynamic behavior of the stability-plasticity balance point. We develop a difference equation to model the evolution of the interplay…
▽ More
The stability-plasticity dilemma, closely related to a neural network's (NN) capacity-its ability to represent tasks-is a fundamental challenge in continual learning (CL). Within this context, we introduce CL's effective model capacity (CLEMC) that characterizes the dynamic behavior of the stability-plasticity balance point. We develop a difference equation to model the evolution of the interplay between the NN, task data, and optimization procedure. We then leverage CLEMC to demonstrate that the effective capacity-and, by extension, the stability-plasticity balance point is inherently non-stationary. We show that regardless of the NN architecture or optimization method, a NN's ability to represent new tasks diminishes when incoming task distributions differ from previous ones. We conduct extensive experiments to support our theoretical findings, spanning a range of architectures-from small feedforward network and convolutional networks to medium-sized graph neural networks and transformer-based large language models with millions of parameters.
△ Less
Submitted 14 August, 2025; v1 submitted 11 August, 2025;
originally announced August 2025.
-
Presburger Functional Synthesis: Complexity and Tractable Normal Forms
Authors:
S. Akshay,
A. R. Balasubramanian,
Supratik Chakraborty,
Georg Zetzsche
Abstract:
Given a relational specification between inputs and outputs as a logic formula, the problem of functional synthesis is to automatically synthesize a function from inputs to outputs satisfying the relation. Recently, a rich line of work has emerged tackling this problem for specifications in different theories, from Boolean to general first-order logic. In this paper, we launch an investigation of…
▽ More
Given a relational specification between inputs and outputs as a logic formula, the problem of functional synthesis is to automatically synthesize a function from inputs to outputs satisfying the relation. Recently, a rich line of work has emerged tackling this problem for specifications in different theories, from Boolean to general first-order logic. In this paper, we launch an investigation of this problem for the theory of Presburger Arithmetic, that we call Presburger Functional Synthesis (PFnS). We show that PFnS can be solved in EXPTIME and provide a matching exponential lower bound. This is unlike the case for Boolean functional synthesis (BFnS), where only conditional exponential lower bounds are known. Further, we show that PFnS for one input and one output variable is as hard as BFnS in general. We then identify a special normal form, called PSyNF, for the specification formula that guarantees poly-time and poly-size solvability of PFnS. We prove several properties of PSyNF, including how to check and compile to this form, and conditions under which any other form that guarantees poly-time solvability of PFnS can be compiled in poly-time to PSyNF. Finally, we identify a syntactic normal form that is easier to check but is exponentially less succinct than PSyNF.
△ Less
Submitted 10 August, 2025;
originally announced August 2025.
-
PROPS: Progressively Private Self-alignment of Large Language Models
Authors:
Noel Teku,
Fengwei Tian,
Payel Bhattacharjee,
Souradip Chakraborty,
Amrit Singh Bedi,
Ravi Tandon
Abstract:
Alignment is a key step in developing Large Language Models (LLMs) using human feedback to ensure adherence to human values and societal norms. Dependence on human feedback raises privacy concerns about how much a labeler's preferences may reveal about their personal values, beliefs, and personality traits. Existing approaches, such as Differentially Private SGD (DP-SGD), provide rigorous privacy…
▽ More
Alignment is a key step in developing Large Language Models (LLMs) using human feedback to ensure adherence to human values and societal norms. Dependence on human feedback raises privacy concerns about how much a labeler's preferences may reveal about their personal values, beliefs, and personality traits. Existing approaches, such as Differentially Private SGD (DP-SGD), provide rigorous privacy guarantees by privatizing gradients during fine-tuning and alignment but can provide more privacy than necessary as human preferences are tied only to labels of (prompt, response) pairs and can degrade model utility. This work focuses on LLM alignment with preference-level privacy, which preserves the privacy of preference labels provided by humans. We propose PROPS (PROgressively Private Self-alignment), a multi-stage privacy preserving alignment framework where privately aligned models in previous stages can serve as labelers for supplementing training data in the subsequent stages of alignment. We present theoretical guarantees for PROPS as well as comprehensive validation using multiple models (Pythia and GPT) and datasets (AlpacaEval, Anthropic HH-RLHF, truthy-dpo-v0.1) to demonstrate the utility of PROPS over existing methods while still providing high privacy. For the same privacy budget, alignment via PROPS can achieve up to 3x higher win-rates compared to DP-SGD, and 2.5x higher win-rates compared to Randomized Response (RR) based alignment.
△ Less
Submitted 8 August, 2025;
originally announced August 2025.
-
Hybrid oscillator-qudit quantum processors: stabilizer states and symplectic operations
Authors:
Sayan Chakraborty,
Victor V. Albert
Abstract:
We construct stabilizer states and error-correcting codes on combinations of discrete- and continuous-variable systems, generalizing the Gottesman-Kitaev-Preskill (GKP) quantum lattice formalism. Our framework absorbs the discrete phase space of a qudit into a hybrid phase space parameterizable entirely by the continuous variables of a harmonic oscillator. The unit cell of a hybrid quantum lattice…
▽ More
We construct stabilizer states and error-correcting codes on combinations of discrete- and continuous-variable systems, generalizing the Gottesman-Kitaev-Preskill (GKP) quantum lattice formalism. Our framework absorbs the discrete phase space of a qudit into a hybrid phase space parameterizable entirely by the continuous variables of a harmonic oscillator. The unit cell of a hybrid quantum lattice grows with the qudit dimension, yielding a way to simultaneously measure an arbitrarily large range of non-commuting position and momentum displacements. Simple hybrid states can be obtained by applying a conditional displacement to a Gottesman-Kitaev-Preskill (GKP) state and a Pauli eigenstate, or by encoding some of the physical qudits of a stabilizer state into a GKP code. The states' oscillator-qudit entanglement cannot be generated using symplectic (i.e., Gaussian-Clifford) operations, distinguishing them as a resource from tensor products of oscillator and qudit stabilizer states. We construct general hybrid error-correcting codes by relating stabilizer codes to non-commutative tori and obtaining logical operators via Morita equivalence. We provide examples using commutation matrices, integer symplectic matrices, and binary codes.
△ Less
Submitted 6 August, 2025;
originally announced August 2025.
-
Measuring and Predicting Where and When Pathologists Focus their Visual Attention while Grading Whole Slide Images of Cancer
Authors:
Souradeep Chakraborty,
Ruoyu Xue,
Rajarsi Gupta,
Oksana Yaskiv,
Constantin Friedman,
Natallia Sheuka,
Dana Perez,
Paul Friedman,
Won-Tak Choi,
Waqas Mahmud,
Beatrice Knudsen,
Gregory Zelinsky,
Joel Saltz,
Dimitris Samaras
Abstract:
The ability to predict the attention of expert pathologists could lead to decision support systems for better pathology training. We developed methods to predict the spatio-temporal (where and when) movements of pathologists' attention as they grade whole slide images (WSIs) of prostate cancer. We characterize a pathologist's attention trajectory by their x, y, and m (magnification) movements of a…
▽ More
The ability to predict the attention of expert pathologists could lead to decision support systems for better pathology training. We developed methods to predict the spatio-temporal (where and when) movements of pathologists' attention as they grade whole slide images (WSIs) of prostate cancer. We characterize a pathologist's attention trajectory by their x, y, and m (magnification) movements of a viewport as they navigate WSIs using a digital microscope. This information was obtained from 43 pathologists across 123 WSIs, and we consider the task of predicting the pathologist attention scanpaths constructed from the viewport centers. We introduce a fixation extraction algorithm that simplifies an attention trajectory by extracting fixations in the pathologist's viewing while preserving semantic information, and we use these pre-processed data to train and test a two-stage model to predict the dynamic (scanpath) allocation of attention during WSI reading via intermediate attention heatmap prediction. In the first stage, a transformer-based sub-network predicts the attention heatmaps (static attention) across different magnifications. In the second stage, we predict the attention scanpath by sequentially modeling the next fixation points in an autoregressive manner using a transformer-based approach, starting at the WSI center and leveraging multi-magnification feature representations from the first stage. Experimental results show that our scanpath prediction model outperforms chance and baseline models. Tools developed from this model could assist pathology trainees in learning to allocate their attention during WSI reading like an expert.
△ Less
Submitted 3 August, 2025;
originally announced August 2025.
-
Private key and password protection by steganographic image encryption
Authors:
Debesh Choudhury,
Sujoy Chakraborty
Abstract:
We propose a technique to protect and preserve a private key or a passcode in an encrypted two-dimensional graphical image. The plaintext private key or the passcode is converted into an encrypted QR code and embedded into a real-life color image with a steganographic scheme. The private key or the passcode is recovered from the stego color image by first extracting the encrypted QR code from the…
▽ More
We propose a technique to protect and preserve a private key or a passcode in an encrypted two-dimensional graphical image. The plaintext private key or the passcode is converted into an encrypted QR code and embedded into a real-life color image with a steganographic scheme. The private key or the passcode is recovered from the stego color image by first extracting the encrypted QR code from the color image, followed by decryption of the QR code. The cryptographic key for encryption of the QR code is generated from the output of a Linear Feedback Shift Register (LFSR), initialized by a seed image chosen by the user. The user can store the seed image securely, without the knowledge of an attacker. Even if an active attacker modifies the seed image (without knowledge of the fact that it is the seed image), the user can easily restore it if he/she keeps multiple copies of it, so that the encryption key can be regenerated easily. Our experiments prove the feasibility of the technique using sample private key data and real-life color images.
△ Less
Submitted 5 June, 2025;
originally announced July 2025.
-
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
Authors:
Xiao Fang,
Minhyek Jeon,
Zheyang Qin,
Stanislav Panev,
Celso de Melo,
Shuowen Hu,
Shayok Chakraborty,
Fernando De la Torre
Abstract:
Detecting vehicles in aerial imagery is a critical task with applications in traffic monitoring, urban planning, and defense intelligence. Deep learning methods have provided state-of-the-art (SOTA) results for this application. However, a significant challenge arises when models trained on data from one geographic region fail to generalize effectively to other areas. Variability in factors such a…
▽ More
Detecting vehicles in aerial imagery is a critical task with applications in traffic monitoring, urban planning, and defense intelligence. Deep learning methods have provided state-of-the-art (SOTA) results for this application. However, a significant challenge arises when models trained on data from one geographic region fail to generalize effectively to other areas. Variability in factors such as environmental conditions, urban layouts, road networks, vehicle types, and image acquisition parameters (e.g., resolution, lighting, and angle) leads to domain shifts that degrade model performance. This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels, improving detector training through data augmentation. Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework utilizing fine-tuned latent diffusion models (LDMs) to mitigate the distribution gap between the source and target environments. Extensive experiments across diverse aerial imagery domains show consistent performance improvements in AP50 over supervised learning on source domain data, weakly supervised adaptation methods, unsupervised domain adaptation methods, and open-set object detectors by 4-23%, 6-10%, 7-40%, and more than 50%, respectively. Furthermore, we introduce two newly annotated aerial datasets from New Zealand and Utah to support further research in this field. Project page is available at: https://humansensinglab.github.io/AGenDA
△ Less
Submitted 28 July, 2025;
originally announced July 2025.
-
Error-Aware Curriculum Learning for Biomedical Relation Classification
Authors:
Sinchani Chakraborty,
Sudeshna Sarkar,
Pawan Goyal
Abstract:
Relation Classification (RC) in biomedical texts is essential for constructing knowledge graphs and enabling applications such as drug repurposing and clinical decision-making. We propose an error-aware teacher--student framework that improves RC through structured guidance from a large language model (GPT-4o). Prediction failures from a baseline student model are analyzed by the teacher to classi…
▽ More
Relation Classification (RC) in biomedical texts is essential for constructing knowledge graphs and enabling applications such as drug repurposing and clinical decision-making. We propose an error-aware teacher--student framework that improves RC through structured guidance from a large language model (GPT-4o). Prediction failures from a baseline student model are analyzed by the teacher to classify error types, assign difficulty scores, and generate targeted remediations, including sentence rewrites and suggestions for KG-based enrichment. These enriched annotations are used to train a first student model via instruction tuning. This model then annotates a broader dataset with difficulty scores and remediation-enhanced inputs. A second student is subsequently trained via curriculum learning on this dataset, ordered by difficulty, to promote robust and progressive learning. We also construct a heterogeneous biomedical knowledge graph from PubMed abstracts to support context-aware RC. Our approach achieves new state-of-the-art performance on 4 of 5 PPI datasets and the DDI dataset, while remaining competitive on ChemProt.
△ Less
Submitted 18 July, 2025;
originally announced July 2025.
-
Fast computational deep thermalization
Authors:
Shantanav Chakraborty,
Soonwon Choi,
Soumik Ghosh,
Tudor Giurgică-Tiron
Abstract:
Deep thermalization refers to the emergence of Haar-like randomness from quantum systems upon partial measurements. As a generalization of quantum thermalization, it is often associated with high complexity and entanglement. Here, we introduce computational deep thermalization and construct the fastest possible dynamics exhibiting it at infinite effective temperature. Our circuit dynamics produce…
▽ More
Deep thermalization refers to the emergence of Haar-like randomness from quantum systems upon partial measurements. As a generalization of quantum thermalization, it is often associated with high complexity and entanglement. Here, we introduce computational deep thermalization and construct the fastest possible dynamics exhibiting it at infinite effective temperature. Our circuit dynamics produce quantum states with low entanglement in polylogarithmic depth that are indistinguishable from Haar random states to any computationally bounded observer. Importantly, the observer is allowed to request many copies of the same residual state obtained from partial projective measurements on the state -- this condition is beyond the standard settings of quantum pseudorandomness, but natural for deep thermalization. In cryptographic terms, these states are pseudorandom, pseudoentangled, and crucially, retain these properties under local measurements. Our results demonstrate a new form of computational thermalization, where thermal-like behavior arises from structured quantum states endowed with cryptographic properties, instead of from highly unstructured ensembles. The low resource complexity of preparing these states suggests scalable simulations of deep thermalization using quantum computers. Our work also motivates the study of computational quantum pseudorandomness beyond BQP observers.
△ Less
Submitted 18 July, 2025;
originally announced July 2025.
-
Counting Answer Sets of Disjunctive Answer Set Programs
Authors:
Mohimenul Kabir,
Supratik Chakraborty,
Kuldeep S Meel
Abstract:
Answer Set Programming (ASP) provides a powerful declarative paradigm for knowledge representation and reasoning. Recently, counting answer sets has emerged as an important computational problem with applications in probabilistic reasoning, network reliability analysis, and other domains. This has motivated significant research into designing efficient ASP counters. While substantial progress has…
▽ More
Answer Set Programming (ASP) provides a powerful declarative paradigm for knowledge representation and reasoning. Recently, counting answer sets has emerged as an important computational problem with applications in probabilistic reasoning, network reliability analysis, and other domains. This has motivated significant research into designing efficient ASP counters. While substantial progress has been made for normal logic programs, the development of practical counters for disjunctive logic programs remains challenging.
We present SharpASP-SR, a novel framework for counting answer sets of disjunctive logic programs based on subtractive reduction to projected propositional model counting. Our approach introduces an alternative characterization of answer sets that enables efficient reduction while ensuring that intermediate representations remain of polynomial size. This allows SharpASP-SR to leverage recent advances in projected model counting technology. Through extensive experimental evaluation on diverse benchmarks, we demonstrate that SharpASP-SR significantly outperforms existing counters on instances with large answer set counts. Building on these results, we develop a hybrid counting approach that combines enumeration techniques with SharpASP-SR to achieve state-of-the-art performance across the full spectrum of disjunctive programs.
△ Less
Submitted 15 July, 2025;
originally announced July 2025.
-
Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators
Authors:
Kazuma Kobayashi,
Shailesh Garg,
Farid Ahmed,
Souvik Chakraborty,
Syed Bahauddin Alam
Abstract:
Robust uncertainty quantification (UQ) remains a critical barrier to the safe deployment of deep learning in real-time virtual sensing, particularly in high-stakes domains where sparse, noisy, or non-collocated sensor data are the norm. We introduce the Conformalized Monte Carlo Operator (CMCO), a framework that transforms neural operator-based virtual sensing with calibrated, distribution-free pr…
▽ More
Robust uncertainty quantification (UQ) remains a critical barrier to the safe deployment of deep learning in real-time virtual sensing, particularly in high-stakes domains where sparse, noisy, or non-collocated sensor data are the norm. We introduce the Conformalized Monte Carlo Operator (CMCO), a framework that transforms neural operator-based virtual sensing with calibrated, distribution-free prediction intervals. By unifying Monte Carlo dropout with split conformal prediction in a single DeepONet architecture, CMCO achieves spatially resolved uncertainty estimates without retraining, ensembling, or custom loss design. Our method addresses a longstanding challenge: how to endow operator learning with efficient and reliable UQ across heterogeneous domains. Through rigorous evaluation on three distinct applications: turbulent flow, elastoplastic deformation, and global cosmic radiation dose estimation-CMCO consistently attains near-nominal empirical coverage, even in settings with strong spatial gradients and proxy-based sensing. This breakthrough offers a general-purpose, plug-and-play UQ solution for neural operators, unlocking real-time, trustworthy inference in digital twins, sensor fusion, and safety-critical monitoring. By bridging theory and deployment with minimal computational overhead, CMCO establishes a new foundation for scalable, generalizable, and uncertainty-aware scientific machine learning.
△ Less
Submitted 15 July, 2025;
originally announced July 2025.
-
Modeling Heterogeneity across Varying Spatial Extents: Discovering Linkages between Sea Ice Retreat and Ice Shelve Melt in the Antarctic
Authors:
Maloy Kumar Devnath,
Sudip Chakraborty,
Vandana P. Janeja
Abstract:
Spatial phenomena often exhibit heterogeneity across spatial extents and in proximity, making them complex to model-especially in dynamic regions like ice shelves and sea ice. In this study, we address this challenge by exploring the linkages between sea ice retreat and Antarctic ice shelf (AIS) melt. Although atmospheric forcing and basal melting have been widely studied, the direct impact of sea…
▽ More
Spatial phenomena often exhibit heterogeneity across spatial extents and in proximity, making them complex to model-especially in dynamic regions like ice shelves and sea ice. In this study, we address this challenge by exploring the linkages between sea ice retreat and Antarctic ice shelf (AIS) melt. Although atmospheric forcing and basal melting have been widely studied, the direct impact of sea ice retreat on AIS mass loss remains underexplored. Traditional models treat sea ice and AIS as separate systems. It limits their ability to capture localized linkages and cascading feedback. To overcome this, we propose Spatial-Link, a novel graph-based framework that quantifies spatial heterogeneity to capture linkages between sea ice retreat and AIS melt. Our method constructs a spatial graph using Delaunay triangulation of satellite-derived ice change matrices, where nodes represent regions of significant change and edges encode proximity and directional consistency. We extract and statistically validate linkage paths using breadth-first search and Monte Carlo simulations. Results reveal non-local, spatially heterogeneous coupling patterns, suggesting sea ice loss can initiate or amplify downstream AIS melt. Our analysis shows how sea ice retreat evolves over an oceanic grid and progresses toward ice shelves-establishing a direct linkage. To our knowledge, this is the first proposed methodology linking sea ice retreat to AIS melt. Spatial-Link offers a scalable, data-driven tool to improve sea-level rise projections and inform climate adaptation strategies.
△ Less
Submitted 18 June, 2025;
originally announced July 2025.
-
Scaling Transformers for Time Series Forecasting: Do Pretrained Large Models Outperform Small-Scale Alternatives?
Authors:
Sanjay Chakraborty,
Ibrahim Delibasoglu,
Fredrik Heintz
Abstract:
Large pre-trained models have demonstrated remarkable capabilities across domains, but their effectiveness in time series forecasting remains understudied. This work empirically examines whether pre-trained large-scale time series models (LSTSMs) trained on diverse datasets can outperform traditional non-pretrained small-scale transformers in forecasting tasks. We analyze state-of-the-art (SOTA) p…
▽ More
Large pre-trained models have demonstrated remarkable capabilities across domains, but their effectiveness in time series forecasting remains understudied. This work empirically examines whether pre-trained large-scale time series models (LSTSMs) trained on diverse datasets can outperform traditional non-pretrained small-scale transformers in forecasting tasks. We analyze state-of-the-art (SOTA) pre-trained universal time series models (e.g., Moirai, TimeGPT) alongside conventional transformers, evaluating accuracy, computational efficiency, and interpretability across multiple benchmarks. Our findings reveal the strengths and limitations of pre-trained LSTSMs, providing insights into their suitability for time series tasks compared to task-specific small-scale architectures. The results highlight scenarios where pretraining offers advantages and where simpler models remain competitive.
△ Less
Submitted 24 June, 2025;
originally announced July 2025.
-
From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems
Authors:
Sawan Kumar,
Tapas Tripura,
Rajdip Nayek,
Souvik Chakraborty
Abstract:
Operator learning offers a powerful paradigm for solving parametric partial differential equations (PDEs), but scaling probabilistic neural operators such as the recently proposed Gaussian Processes Operators (GPOs) to high-dimensional, data-intensive regimes remains a significant challenge. In this work, we introduce a novel, scalable GPO, which capitalizes on sparsity, locality, and structural i…
▽ More
Operator learning offers a powerful paradigm for solving parametric partial differential equations (PDEs), but scaling probabilistic neural operators such as the recently proposed Gaussian Processes Operators (GPOs) to high-dimensional, data-intensive regimes remains a significant challenge. In this work, we introduce a novel, scalable GPO, which capitalizes on sparsity, locality, and structural information through judicious kernel design. Addressing the fundamental limitation of cubic computational complexity, our method leverages nearest-neighbor-based local kernel approximations in the spatial domain, sparse kernel approximation in the parameter space, and structured Kronecker factorizations to enable tractable inference on large-scale datasets and high-dimensional input. While local approximations often introduce accuracy trade-offs due to limited kernel interactions, we overcome this by embedding operator-aware kernel structures and employing expressive, task-informed mean functions derived from neural operator architectures. Through extensive evaluations on a broad class of nonlinear PDEs - including Navier-Stokes, wave advection, Darcy flow, and Burgers' equations - we demonstrate that our framework consistently achieves high accuracy across varying discretization scales. These results underscore the potential of our approach to bridge the gap between scalability and fidelity in GPO, offering a compelling foundation for uncertainty-aware modeling in complex physical systems.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Shortest Paths in a Weighted Simplicial Complex
Authors:
Sukrit Chakraborty,
Prasanta Choudhury,
Arindam Mukherjee
Abstract:
Simplicial complexes are extensively studied in the field of algebraic topology. They have gained attention in recent time due to their applications in fields like theoretical distributed computing and simplicial neural networks. Graphs are mono-dimensional simplicial complex. Graph theory has application in topics like theoretical computer science, operations research, bioinformatics and social s…
▽ More
Simplicial complexes are extensively studied in the field of algebraic topology. They have gained attention in recent time due to their applications in fields like theoretical distributed computing and simplicial neural networks. Graphs are mono-dimensional simplicial complex. Graph theory has application in topics like theoretical computer science, operations research, bioinformatics and social sciences. This makes it natural to try to adapt graph-theoretic results for simplicial complexes, which can model more intricate and detailed structures appearing in real-world systems. In this article, we define the concept of weighted simplicial complex and $d$-path in a simplicial complex. Both these concepts have the potential to have numerous real-life applications. Next, we provide a novel algorithm to find the shortest paths in a weighted simplicial complex. The core principles of our algorithm align with those of Dijkstra$^\prime$s algorithm for graphs. Hence, this work lays another brick for the sake of integrating graph-theoretic concepts with abstract simplicial complexes.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Assessing the Quality of Binomial Samplers: A Statistical Distance Framework
Authors:
Uddalok Sarkar,
Sourav Chakraborty,
Kuldeep S. Meel
Abstract:
Randomized algorithms depend on accurate sampling from probability distributions, as their correctness and performance hinge on the quality of the generated samples. However, even for common distributions like Binomial, exact sampling is computationally challenging, leading standard library implementations to rely on heuristics. These heuristics, while efficient, suffer from approximation and syst…
▽ More
Randomized algorithms depend on accurate sampling from probability distributions, as their correctness and performance hinge on the quality of the generated samples. However, even for common distributions like Binomial, exact sampling is computationally challenging, leading standard library implementations to rely on heuristics. These heuristics, while efficient, suffer from approximation and system representation errors, causing deviations from the ideal distribution. Although seemingly minor, such deviations can accumulate in downstream applications requiring large-scale sampling, potentially undermining algorithmic guarantees. In this work, we propose statistical distance as a robust metric for analyzing the quality of Binomial samplers, quantifying deviations from the ideal distribution. We derive rigorous bounds on the statistical distance for standard implementations and demonstrate the practical utility of our framework by enhancing APSEst, a DNF model counter, with improved reliability and error guarantees. To support practical adoption, we propose an interface extension that allows users to control and monitor statistical distance via explicit input/output parameters. Our findings emphasize the critical need for thorough and systematic error analysis in sampler design. As the first work to focus exclusively on Binomial samplers, our approach lays the groundwork for extending rigorous analysis to other common distributions, opening avenues for more robust and reliable randomized algorithms.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Data-Centric Safety and Ethical Measures for Data and AI Governance
Authors:
Srija Chakraborty
Abstract:
Datasets play a key role in imparting advanced capabilities to artificial intelligence (AI) foundation models that can be adapted to various downstream tasks. These downstream applications can introduce both beneficial and harmful capabilities -- resulting in dual use AI foundation models, with various technical and regulatory approaches to monitor and manage these risks. However, despite the cruc…
▽ More
Datasets play a key role in imparting advanced capabilities to artificial intelligence (AI) foundation models that can be adapted to various downstream tasks. These downstream applications can introduce both beneficial and harmful capabilities -- resulting in dual use AI foundation models, with various technical and regulatory approaches to monitor and manage these risks. However, despite the crucial role of datasets, responsible dataset design and ensuring data-centric safety and ethical practices have received less attention. In this study, we pro-pose responsible dataset design framework that encompasses various stages in the AI and dataset lifecycle to enhance safety measures and reduce the risk of AI misuse due to low quality, unsafe and unethical data content. This framework is domain agnostic, suitable for adoption for various applications and can promote responsible practices in dataset creation, use, and sharing to facilitate red teaming, minimize risks, and increase trust in AI models.
△ Less
Submitted 30 June, 2025; v1 submitted 11 June, 2025;
originally announced June 2025.
-
Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
Authors:
Andrei Mircea,
Supriyo Chakraborty,
Nima Chitsazan,
Milind Naphade,
Sambit Sahu,
Irina Rish,
Ekaterina Lobacheva
Abstract:
This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which dece…
▽ More
This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl
△ Less
Submitted 14 July, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models
Authors:
Soumya Suvra Ghosal,
Souradip Chakraborty,
Avinash Reddy,
Yifu Lu,
Mengdi Wang,
Dinesh Manocha,
Furong Huang,
Mohammad Ghavamzadeh,
Amrit Singh Bedi
Abstract:
Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek R1) have led to a popular belief that extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance. This raises a natural question: Does thinking more at test-time truly lead to better reasoning? To answer this question, we perform a detailed empirical study across models and bench…
▽ More
Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek R1) have led to a popular belief that extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance. This raises a natural question: Does thinking more at test-time truly lead to better reasoning? To answer this question, we perform a detailed empirical study across models and benchmarks, which reveals a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking". To understand this non-monotonic trend, we consider a simple probabilistic model, which reveals that additional thinking increases output variance-creating an illusion of improved reasoning while ultimately undermining precision. Thus, observed gains from "more thinking" are not true indicators of improved reasoning, but artifacts stemming from the connection between model uncertainty and evaluation metric. This suggests that test-time scaling through extended thinking is not an effective way to utilize the inference thinking budget. Recognizing these limitations, we introduce an alternative test-time scaling approach, parallel thinking, inspired by Best-of-N sampling. Our method generates multiple independent reasoning paths within the same inference budget and selects the most consistent response via majority vote, achieving up to 20% higher accuracy compared to extended thinking. This provides a simple yet effective mechanism for test-time scaling of reasoning models.
△ Less
Submitted 13 June, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
Authors:
Mohammad Saqib Hasan,
Saikat Chakraborty,
Santu Karmaker,
Niranjan Balasubramanian
Abstract:
LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues…
▽ More
LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues and the fix. The key idea here is to make use of security knowledge sources to devise a systematic prompting strategy that ensures broad coverage. Second, aligning models to secure code requires focusing on localized regions of code. Direct preference optimization methods, like SimPO, are not designed to handle these localized differences and turn out to be ineffective. We address this with a new localized preference optimization algorithm that masks the security related tokens in both the winning (secure) and losing (insecure) responses. To prevent loss in code quality, we also add a regularizer. Evaluations show that both training on our dataset, DiSCo, and the new preference optimization algorithm, LPO, yield substantial reductions in code insecurity while also improving overall code quality. Code and dataset are available at https://github.com/StonyBrookNLP/disco-lpo.
△ Less
Submitted 10 September, 2025; v1 submitted 31 May, 2025;
originally announced June 2025.
-
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time
Authors:
Mohamad Chehade,
Soumya Suvra Ghosal,
Souradip Chakraborty,
Avinash Reddy,
Dinesh Manocha,
Hao Zhu,
Amrit Singh Bedi
Abstract:
Aligning large language models with humans is challenging due to the inherently multifaceted nature of preference feedback. While existing approaches typically frame this as a multi-objective optimization problem, they often overlook how humans actually make decisions. Research on bounded rationality suggests that human decision making follows satisficing strategies-optimizing primary objectives w…
▽ More
Aligning large language models with humans is challenging due to the inherently multifaceted nature of preference feedback. While existing approaches typically frame this as a multi-objective optimization problem, they often overlook how humans actually make decisions. Research on bounded rationality suggests that human decision making follows satisficing strategies-optimizing primary objectives while ensuring others meet acceptable thresholds. To bridge this gap and operationalize the notion of satisficing alignment, we propose SITAlign: an inference time framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria. We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach. We empirically validate SITAlign's performance through extensive experimentation on multiple benchmarks. For instance, on the PKU-SafeRLHF dataset with the primary objective of maximizing helpfulness while ensuring a threshold on harmlessness, SITAlign outperforms the state-of-the-art multi objective decoding strategy by a margin of 22.3% in terms of GPT-4 win-tie rate for helpfulness reward while adhering to the threshold on harmlessness.
△ Less
Submitted 31 May, 2025; v1 submitted 29 May, 2025;
originally announced May 2025.
-
LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model
Authors:
Avijit Gayen,
Somyajit Chakraborty,
Mainak Sen,
Soham Paul,
Angshuman Jana
Abstract:
The persistent accumulation of unresolved legal cases, especially within the Indian judiciary, significantly hampers the timely delivery of justice. Manual methods of prioritizing petitions are often prone to inefficiencies and subjective biases further exacerbating delays. To address this issue, we propose LLMPR (Large Language Model-based Petition Ranking), an automated framework that utilizes t…
▽ More
The persistent accumulation of unresolved legal cases, especially within the Indian judiciary, significantly hampers the timely delivery of justice. Manual methods of prioritizing petitions are often prone to inefficiencies and subjective biases further exacerbating delays. To address this issue, we propose LLMPR (Large Language Model-based Petition Ranking), an automated framework that utilizes transfer learning and machine learning to assign priority rankings to legal petitions based on their contextual urgency. Leveraging the ILDC dataset comprising 7,593 annotated petitions, we process unstructured legal text and extract features through various embedding techniques, including DistilBERT, LegalBERT, and MiniLM. These textual embeddings are combined with quantitative indicators such as gap days, rank scores, and word counts to train multiple machine learning models, including Random Forest, Decision Tree, XGBoost, LightGBM, and CatBoost. Our experiments demonstrate that Random Forest and Decision Tree models yield superior performance, with accuracy exceeding 99% and a Spearman rank correlation of 0.99. Notably, models using only numerical features achieve nearly optimal ranking results (R2 = 0.988, \r{ho} = 0.998), while LLM-based embeddings offer only marginal gains. These findings suggest that automated petition ranking can effectively streamline judicial workflows, reduce case backlog, and improve fairness in legal prioritization.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
GPUMC: A Stateless Model Checker for GPU Weak Memory Concurrency
Authors:
Soham Chakraborty,
S. Krishna,
Andreas Pavlogiannis,
Omkar Tuppe
Abstract:
GPU computing is embracing weak memory concurrency for performance improvement. However, compared to CPUs, modern GPUs provide more fine-grained concurrency features such as scopes, have additional properties like divergence, and thereby follow different weak memory consistency models. These features and properties make concurrent programming on GPUs more complex and error-prone. To this end, we p…
▽ More
GPU computing is embracing weak memory concurrency for performance improvement. However, compared to CPUs, modern GPUs provide more fine-grained concurrency features such as scopes, have additional properties like divergence, and thereby follow different weak memory consistency models. These features and properties make concurrent programming on GPUs more complex and error-prone. To this end, we present GPUMC, a stateless model checker to check the correctness of GPU shared-memory concurrent programs under scoped-RC11 weak memory concurrency model. GPUMC explores all possible executions in GPU programs to reveal various errors - races, barrier divergence, and assertion violations. In addition, GPUMC also automatically repairs these errors in the appropriate cases.
We evaluate GPUMC with benchmarks and real-life GPU programs. GPUMC is efficient both in time and memory in verifying large GPU programs where state-of-the-art tools are timed out. In addition, GPUMC identifies all known errors in these benchmarks compared to the state-of-the-art tools.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
GAIA: A Foundation Model for Operational Atmospheric Dynamics
Authors:
Ata Akbari Asanjan,
Olivia Alexander,
Tom Berg,
Clara Zhang,
Matt Yang,
Jad Makki,
Disha Shidham,
Srija Chakraborty,
William Bender,
Stephen Peng,
Arun Ravindran,
Olivier Raiman,
David Potere,
David Bell
Abstract:
We present the GAIA (Geospatial Artificial Intelligence for Atmospheres) Foundation Model, a novel model that combines masked autoencoders (MAE) and self-DIstillation with NO labels (DINO) for analyzing global atmospheric patterns in satellite imagery. By integrating these complementary self-supervised learning approaches, our model simultaneously captures both local features and global dependenci…
▽ More
We present the GAIA (Geospatial Artificial Intelligence for Atmospheres) Foundation Model, a novel model that combines masked autoencoders (MAE) and self-DIstillation with NO labels (DINO) for analyzing global atmospheric patterns in satellite imagery. By integrating these complementary self-supervised learning approaches, our model simultaneously captures both local features and global dependencies. We address two critical challenges in satellite data analysis: reconstructing missing regions and estimating precipitation patterns as our first downstream tasks. The model demonstrates superior temporal pattern capture compared to standard MAE approaches, while maintaining robust performance in downstream tasks. Our experimental results show strong gap-filling capabilities across varying mask ratios and accurate precipitation estimation with limited training data, achieving a false alarm ratio of 0.088 and structural similarity of 0.881. This work represents an advancement in self-supervised learning for atmospheric science, providing a foundation for improved weather monitoring and climate analysis. The trained model weights and accompanying code are publicly available as open-source on Hugging Face here: https://huggingface.co/bcg-usra-nasa-gaia/GAIA-v1.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
Authors:
Berkcan Kapusuzoglu,
Supriyo Chakraborty,
Chia-Hsuan Lee,
Sambit Sahu
Abstract:
Supervised fine-tuning (SFT) with expert demonstrations often suffers from the imitation problem, where models reproduce correct responses without internalizing the underlying reasoning. We propose Critique-Guided Distillation (CGD), a multi-stage training framework that augments SFT with teacher-generated explanatory critiques and refined responses. Instead of directly imitating teacher outputs,…
▽ More
Supervised fine-tuning (SFT) with expert demonstrations often suffers from the imitation problem, where models reproduce correct responses without internalizing the underlying reasoning. We propose Critique-Guided Distillation (CGD), a multi-stage training framework that augments SFT with teacher-generated explanatory critiques and refined responses. Instead of directly imitating teacher outputs, a student learns to map the triplet of prompt, its own initial response, and teacher critique into the refined teacher response, thereby capturing both what to output and why. Our analyses show that CGD consistently reduces refinement uncertainty, improves alignment between critiques and responses, and enhances sample efficiency. On reasoning benchmarks, CGD achieves substantial gains across LLaMA and Qwen families, including +15.0% on AMC23 and +12.2% on MATH-500, while avoiding the format drift issues observed in prior critique-based fine-tuning. Importantly, on LLaMA-3.1-8B CGD approaches or exceeds the performance of SimpleRL-Zero, which is a DeepSeek-R1 replication, while requiring 60x less compute. Beyond reasoning, CGD maintains or improves general instruction-following and factual accuracy, matching baseline performance on IFEval, MUSR, TruthfulQA, and BBH. In contrast, prior critique-based methods degrade these capabilities (e.g., -21% on IFEval). Taken together, these results establish CGD} as a robust and generalizable alternative to both conventional SFT and RL-based methods, offering a more efficient path toward advancing the reasoning and safety of large language models.
△ Less
Submitted 26 September, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
An Optimized Evacuation Plan for an Active-Shooter Situation Constrained by Network Capacity
Authors:
Joseph Lavalle-Rivera,
Aniirudh Ramesh,
Subhadeep Chakraborty
Abstract:
A total of more than 3400 public shootings have occurred in the United States between 2016 and 2022. Among these, 25.1% of them took place in an educational institution, 29.4% at the workplace including office buildings, 19.6% in retail store locations, and 13.4% in restaurants and bars. During these critical scenarios, making the right decisions while evacuating can make the difference between li…
▽ More
A total of more than 3400 public shootings have occurred in the United States between 2016 and 2022. Among these, 25.1% of them took place in an educational institution, 29.4% at the workplace including office buildings, 19.6% in retail store locations, and 13.4% in restaurants and bars. During these critical scenarios, making the right decisions while evacuating can make the difference between life and death. However, emergency evacuation is intensely stressful, which along with the lack of verifiable real-time information may lead to fatal incorrect decisions. To tackle this problem, we developed a multi-route routing optimization algorithm that determines multiple optimal safe routes for each evacuee while accounting for available capacity along the route, thus reducing the threat of crowding and bottlenecking. Overall, our algorithm reduces the total casualties by 34.16% and 53.3%, compared to our previous routing algorithm without capacity constraints and an expert-advised routing strategy respectively. Further, our approach to reduce crowding resulted in an approximate 50% reduction in occupancy in key bottlenecking nodes compared to both of the other evacuation algorithms.
△ Less
Submitted 29 April, 2025;
originally announced May 2025.
-
Simulating quantum collision models with Hamiltonian simulations using early fault-tolerant quantum computers
Authors:
Kushagra Garg,
Zeeshan Ahmed,
Subhadip Mitra,
Shantanav Chakraborty
Abstract:
We develop randomized quantum algorithms to simulate quantum collision models, also known as repeated interaction schemes, which provide a rich framework to model various open-system dynamics. The underlying technique involves composing time evolutions of the total (system, bath, and interaction) Hamiltonian and intermittent tracing out of the environment degrees of freedom. This results in a unif…
▽ More
We develop randomized quantum algorithms to simulate quantum collision models, also known as repeated interaction schemes, which provide a rich framework to model various open-system dynamics. The underlying technique involves composing time evolutions of the total (system, bath, and interaction) Hamiltonian and intermittent tracing out of the environment degrees of freedom. This results in a unified framework where any near-term Hamiltonian simulation algorithm can be incorporated to implement an arbitrary number of such collisions on early fault-tolerant quantum computers: we do not assume access to specialized oracles such as block encodings and minimize the number of ancilla qubits needed. In particular, using the correspondence between Lindbladian evolution and completely positive trace-preserving maps arising out of memoryless collisions, we provide an end-to-end quantum algorithm for simulating Lindbladian dynamics. For a system of $n$-qubits, we exhaustively compare the circuit depth needed to estimate the expectation value of an observable with respect to the reduced state of the system after time $t$ while employing different near-term Hamiltonian simulation techniques, requiring at most $n+2$ qubits in all. We compare the CNOT gate counts of the various approaches for estimating the Transverse Field Magnetization of a $10$-qubit XX-Heisenberg spin chain under amplitude damping. Finally, we also develop a framework to efficiently simulate an arbitrary number of memory-retaining collisions, i.e., where environments interact, leading to non-Markovian dynamics. Overall, our methods can leverage quantum collision models for both Markovian and non-Markovian dynamics on early fault-tolerant quantum computers, shedding light on the advantages and limitations of simulating open systems dynamics using this framework.
△ Less
Submitted 19 August, 2025; v1 submitted 30 April, 2025;
originally announced April 2025.
-
A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces
Authors:
Juliana Barbosa,
Ulhas Gondhali,
Gohar Petrossian,
Kinshuk Sharma,
Sunandan Chakraborty,
Jennifer Jacquet,
Juliana Freire
Abstract:
Wildlife trafficking remains a critical global issue, significantly impacting biodiversity, ecological stability, and public health. Despite efforts to combat this illicit trade, the rise of e-commerce platforms has made it easier to sell wildlife products, putting new pressure on wild populations of endangered and threatened species. The use of these platforms also opens a new opportunity: as cri…
▽ More
Wildlife trafficking remains a critical global issue, significantly impacting biodiversity, ecological stability, and public health. Despite efforts to combat this illicit trade, the rise of e-commerce platforms has made it easier to sell wildlife products, putting new pressure on wild populations of endangered and threatened species. The use of these platforms also opens a new opportunity: as criminals sell wildlife products online, they leave digital traces of their activity that can provide insights into trafficking activities as well as how they can be disrupted. The challenge lies in finding these traces. Online marketplaces publish ads for a plethora of products, and identifying ads for wildlife-related products is like finding a needle in a haystack. Learning classifiers can automate ad identification, but creating them requires costly, time-consuming data labeling that hinders support for diverse ads and research questions. This paper addresses a critical challenge in the data science pipeline for wildlife trafficking analytics: generating quality labeled data for classifiers that select relevant data. While large language models (LLMs) can directly label advertisements, doing so at scale is prohibitively expensive. We propose a cost-effective strategy that leverages LLMs to generate pseudo labels for a small sample of the data and uses these labels to create specialized classification models. Our novel method automatically gathers diverse and representative samples to be labeled while minimizing the labeling costs. Our experimental evaluation shows that our classifiers achieve up to 95% F1 score, outperforming LLMs at a lower cost. We present real use cases that demonstrate the effectiveness of our approach in enabling analyses of different aspects of wildlife trafficking.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
RadarTrack: Enhancing Ego-Vehicle Speed Estimation with Single-chip mmWave Radar
Authors:
Argha Sen,
Soham Chakraborty,
Soham Tripathy,
Sandip Chakraborty
Abstract:
In this work, we introduce RadarTrack, an innovative ego-speed estimation framework utilizing a single-chip millimeter-wave (mmWave) radar to deliver robust speed estimation for mobile platforms. Unlike previous methods that depend on cross-modal learning and computationally intensive Deep Neural Networks (DNNs), RadarTrack utilizes a novel phase-based speed estimation approach. This method effect…
▽ More
In this work, we introduce RadarTrack, an innovative ego-speed estimation framework utilizing a single-chip millimeter-wave (mmWave) radar to deliver robust speed estimation for mobile platforms. Unlike previous methods that depend on cross-modal learning and computationally intensive Deep Neural Networks (DNNs), RadarTrack utilizes a novel phase-based speed estimation approach. This method effectively overcomes the limitations of conventional ego-speed estimation approaches which rely on doppler measurements and static surrondings. RadarTrack is designed for low-latency operation on embedded platforms, making it suitable for real-time applications where speed and efficiency are critical. Our key contributions include the introduction of a novel phase-based speed estimation technique solely based on signal processing and the implementation of a real-time prototype validated through extensive real-world evaluations. By providing a reliable and lightweight solution for ego-speed estimation, RadarTrack holds significant potential for a wide range of applications, including micro-robotics, augmented reality, and autonomous navigation.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Authors:
Ashwinee Panda,
Vatsal Baherwani,
Zain Sarwar,
Benjamin Therien,
Supriyo Chakraborty,
Tom Goldstein
Abstract:
Mixture of Experts (MoE) pretraining is more scalable than dense Transformer pretraining, because MoEs learn to route inputs to a sparse set of their feedforward parameters. However, this means that MoEs only receive a sparse backward update, leading to training instability and suboptimal performance. We present a lightweight approximation method that gives the MoE router a dense gradient update w…
▽ More
Mixture of Experts (MoE) pretraining is more scalable than dense Transformer pretraining, because MoEs learn to route inputs to a sparse set of their feedforward parameters. However, this means that MoEs only receive a sparse backward update, leading to training instability and suboptimal performance. We present a lightweight approximation method that gives the MoE router a dense gradient update while continuing to sparsely activate its parameters. Our method, which we refer to as Default MoE, substitutes missing expert activations with default outputs consisting of an exponential moving average of expert outputs previously seen over the course of training. This allows the router to receive signals from every expert for each token, leading to significant improvements in training performance. Our Default MoE outperforms standard TopK routing in a variety of settings without requiring significant computational overhead. Code: https://github.com/vatsal0/default-moe.
△ Less
Submitted 17 April, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
On the Need to Rethink Trust in AI Assistants for Software Development: A Critical Review
Authors:
Sebastian Baltes,
Timo Speith,
Brenda Chiteri,
Seyedmoein Mohsenimofidi,
Shalini Chakraborty,
Daniel Buschek
Abstract:
Trust is a fundamental concept in human decision-making and collaboration that has long been studied in philosophy and psychology. However, software engineering (SE) articles often use the term trust informally-providing an explicit definition or embedding results in established trust models is rare. In SE research on AI assistants, this practice culminates in equating trust with the likelihood of…
▽ More
Trust is a fundamental concept in human decision-making and collaboration that has long been studied in philosophy and psychology. However, software engineering (SE) articles often use the term trust informally-providing an explicit definition or embedding results in established trust models is rare. In SE research on AI assistants, this practice culminates in equating trust with the likelihood of accepting generated content, which, in isolation, does not capture the full complexity of the trust concept. Without a common definition, true secondary research on trust is impossible. The objectives of our research were: (1) to present the psychological and philosophical foundations of human trust, (2) to systematically study how trust is conceptualized in SE and the related disciplines human-computer interaction and information systems, and (3) to discuss limitations of equating trust with content acceptance, outlining how SE research can adopt existing trust models to overcome the widespread informal use of the term trust. We conducted a literature review across disciplines and a critical review of recent SE articles focusing on conceptualizations of trust. We found that trust is rarely defined or conceptualized in SE articles. Related disciplines commonly embed their methodology and results in established trust models, clearly distinguishing, for example, between initial trust and trust formation and between appropriate and inappropriate trust. On a meta-scientific level, other disciplines further discuss whether and when trust can be applied to AI assistants at all. Our study reveals a significant maturity gap of trust research in SE compared to related disciplines. We provide concrete recommendations on how SE researchers can adopt established trust models and instruments to study trust in AI assistants beyond the acceptance of generated software artifacts.
△ Less
Submitted 5 August, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures
Authors:
Prabhu Vellaisamy,
Thomas Labonte,
Sourav Chakraborty,
Matt Turner,
Samantika Sury,
John Paul Shen
Abstract:
Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-GPU coupled architectures is crucial for optimization. This paper presents an in-depth analysis of LLM inference behavior on loosely-coupled (PCIe A100/H100) and closely-coupled (GH200) systems. We ana…
▽ More
Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-GPU coupled architectures is crucial for optimization. This paper presents an in-depth analysis of LLM inference behavior on loosely-coupled (PCIe A100/H100) and closely-coupled (GH200) systems. We analyze performance dynamics using fine-grained operator-to-kernel trace analysis, facilitated by our novel profiler SKIP and metrics like Total Kernel Launch and Queuing Time (TKLQT). Results show that closely-coupled (CC) GH200 significantly outperforms loosely-coupled (LC) systems at large batch sizes, achieving 1.9x-2.7x faster prefill latency for Llama 3.2-1B. However, our analysis also reveals that GH200 remains CPU-bound up to 4x larger batch sizes than LC systems. In this extended CPU-bound region, we identify the performance characteristics of the Grace CPU as a key factor contributing to higher inference latency at low batch sizes on GH200. We demonstrate that TKLQT accurately identifies this CPU/GPU-bound transition point. Based on this analysis, we further show that kernel fusion offers significant potential to mitigate GH200's low-batch latency bottleneck by reducing kernel launch overhead. This detailed kernel-level characterization provides critical insights for optimizing diverse CPU-GPU coupling strategies. This work is an initial effort, and we plan to explore other major AI/DL workloads that demand different degrees of CPU-GPU heterogeneous architectures.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems
Authors:
Seyedeh Baharan Khatami,
Sayan Chakraborty,
Ruomeng Xu,
Babak Salimi
Abstract:
Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on historical interaction data introduces biases-such as selection, exposure, conformity, and position biases-that distort evaluation metrics, driven by the Missing-Not-At…
▽ More
Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on historical interaction data introduces biases-such as selection, exposure, conformity, and position biases-that distort evaluation metrics, driven by the Missing-Not-At-Random (MNAR) nature of user interactions and favoring popular or frequently exposed items over true user preferences.
We propose a novel framework for robust offline evaluation of retrieval-ranking systems, transforming MNAR data into Missing-At-Random (MAR) through reweighting combined with black-box optimization, guided by neural estimation of information-theoretic metrics. Our contributions include (1) a causal formulation for addressing offline evaluation biases, (2) a system-agnostic debiasing framework, and (3) empirical validation of its effectiveness. This framework enables more accurate, fair, and generalizable evaluations, enhancing model assessment before deployment.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Quantum singular value transformation without block encodings: Near-optimal complexity with minimal ancilla
Authors:
Shantanav Chakraborty,
Soumyabrata Hazra,
Tongyang Li,
Changpeng Shao,
Xinzhao Wang,
Yuxin Zhang
Abstract:
We develop new algorithms for Quantum Singular Value Transformation (QSVT), a unifying framework that encapsulates most known quantum algorithms and serves as the foundation for new ones. Existing implementations of QSVT rely on block encoding, incurring an intrinsic $O(\log L)$ ancilla overhead and circuit depth $\widetilde{O}(L dλ)$ for polynomial transformations of a Hamiltonian…
▽ More
We develop new algorithms for Quantum Singular Value Transformation (QSVT), a unifying framework that encapsulates most known quantum algorithms and serves as the foundation for new ones. Existing implementations of QSVT rely on block encoding, incurring an intrinsic $O(\log L)$ ancilla overhead and circuit depth $\widetilde{O}(L dλ)$ for polynomial transformations of a Hamiltonian $H=\sum_{k=1}^L H_k$, where $d$ is the polynomial degree and $λ=\sum_{k}\|H_k\|$.
We introduce a simple yet powerful approach that utilizes only basic Hamiltonian simulation techniques, namely, Trotter methods, to: (i) eliminate the need for block encoding, (ii) reduce the ancilla overhead to only a single qubit, and (iii) still maintain near-optimal complexity. Our method achieves a circuit depth of $\widetilde{O}(L(dλ_{\mathrm{comm}})^{1+o(1)})$, without requiring any complicated multi-qubit controlled gates. Moreover, $λ_{\mathrm{comm}}$ depends on the nested commutators of the terms of $H$ and can be substantially smaller than $λ$ for many physically relevant Hamiltonians, a feature absent in standard QSVT. To achieve these results, we make use of Richardson extrapolation in a novel way, systematically eliminating errors in any interleaved sequence of arbitrary unitaries and Hamiltonian evolution operators, thereby establishing a general framework that encompasses QSVT but is more broadly applicable.
As applications, we develop end-to-end quantum algorithms for solving linear systems and estimating ground state properties of Hamiltonians, both achieving near-optimal complexity without relying on oracular access. Overall, our results establish a new framework for quantum algorithms, significantly reducing hardware overhead while maintaining near-optimal performance, with implications for both near-term and fault-tolerant quantum computing.
△ Less
Submitted 3 September, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows
Authors:
Souradip Chakraborty,
Mohammadreza Pourreza,
Ruoxi Sun,
Yiwen Song,
Nino Scherrer,
Furong Huang,
Amrit Singh Bedi,
Ahmad Beirami,
Jindong Gu,
Hamid Palangi,
Tomas Pfister
Abstract:
Agentic AI workflows (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low. A promising solution is inference-time alignment, which uses extra compute at test time to improve performance. Inference-time alignment relies on three components: sampling, evaluation, and feedback. While most prior work studies sampling and automatic e…
▽ More
Agentic AI workflows (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low. A promising solution is inference-time alignment, which uses extra compute at test time to improve performance. Inference-time alignment relies on three components: sampling, evaluation, and feedback. While most prior work studies sampling and automatic evaluation, feedback remains underexplored. To study the role of feedback, we introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques (reward models or AI-generated textual feedback) between decoding steps. Through IAD, we analyze feedback along four dimensions: (1) its role in the accuracy-compute trade-offs with limited inference budget, (2) quantifying the gains over diversity-only baselines such as best-of-N sampling, (3) effectiveness of composing feedback from reward models versus textual critique, and (4) robustness to noisy or low-quality feedback. Across Sketch2Code, Text2SQL, Intercode, and WebShop, we show that IAD with proper integration of high fidelity feedback leads to consistent gains up to 10 percent absolute performance improvement over various baselines such as best-of-N. Our findings underscore feedback as a crucial knob for inference-time alignment of agentic AI workflows with limited inference budget.
△ Less
Submitted 7 July, 2025; v1 submitted 2 April, 2025;
originally announced April 2025.