Search | arXiv e-print repository

Scalable Learning of High-Dimensional Demonstrations with Composition of Linear Parameter Varying Dynamical Systems

Authors: Shreenabh Agrawal, Hugo T. M. Kussaba, Lingyun Chen, Allen Emmanuel Binny, Abdalla Swikir, Pushpak Jagtap, Sami Haddadin

Abstract: Learning from Demonstration (LfD) techniques enable robots to learn and generalize tasks from user demonstrations, eliminating the need for coding expertise among end-users. One established technique to implement LfD in robots is to encode demonstrations in a stable Dynamical System (DS). However, finding a stable dynamical system entails solving an optimization problem with bilinear matrix inequa… ▽ More Learning from Demonstration (LfD) techniques enable robots to learn and generalize tasks from user demonstrations, eliminating the need for coding expertise among end-users. One established technique to implement LfD in robots is to encode demonstrations in a stable Dynamical System (DS). However, finding a stable dynamical system entails solving an optimization problem with bilinear matrix inequality (BMI) constraints, a non-convex problem which, depending on the number of scalar constraints and variables, demands significant computational resources and is susceptible to numerical issues such as floating-point errors. To address these challenges, we propose a novel compositional approach that enhances the applicability and scalability of learning stable DSs with BMIs. △ Less

Submitted 5 July, 2025; originally announced July 2025.

Comments: Submitted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

MSC Class: 68T40 ACM Class: I.2.9

arXiv:2506.22694 [pdf, ps, other]

VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs

Authors: Raghavv Goel, Sudhanshu Agrawal, Mukul Gagrani, Junyoung Park, Yifan Zao, He Zhang, Tian Liu, Yiping Yang, Xin Yuan, Jiuyan Lu, Chris Lott, Mingu Lee

Abstract: In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, f… ▽ More In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, followed by verification by a base LLM, a target model, accepting a subset as its valid generation. As it is usually considered that the speculative decoding requires one-to-one mapping between vocabularies of the target model and the draft model, it has been natural to share the vocabulary between them, or even share the LM head as in EAGLE or Medusa. We first identify that this draft token sampling scheme inherently contains an unnecessary inference overhead in drafting, especially for some target LLMs with very large vocabularies. Then, we propose a simple technique, VocabTrim, to mitigate the drafting overhead to improve the generation speed in memory-bound environment. VocabTrim reconstructs the drafter LM head to contain only a limited set of tokens, selected by the most frequently sampled from the vocabulary of the target model. While limiting the vocabulary in drafting slightly degrades the acceptance rate, it significantly reduces the drafting latency in memory-bound process which is often the case on edge devices, resulting in higher memory-bound speed up (MBSU). We show that our method can boost the memory-bound speed-up for Llama-3 models on Spec-Bench, specifically by 16% for Llama-3.2-3B-Instruct. △ Less

Submitted 3 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

Comments: 8 pages, 4 figures, 5 tables, accepted at ICML 2025 workshop on Efficient Systems for Foundational Models

arXiv:2506.19743 [pdf, ps, other]

NEAR$^2$: A Nested Embedding Approach to Efficient Product Retrieval and Ranking

Authors: Shenbin Qian, Diptesh Kanojia, Samarth Agrawal, Hadeel Saadany, Swapnil Bhosale, Constantin Orasan, Zhe Wu

Abstract: E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and maintain efficient processing of vast product catalogs. The dual challenge lies in precisely matching user intent with relevant products while managing the computational demands of real-time search across massive inventories. In this paper, we propose a Nested Emb… ▽ More E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and maintain efficient processing of vast product catalogs. The dual challenge lies in precisely matching user intent with relevant products while managing the computational demands of real-time search across massive inventories. In this paper, we propose a Nested Embedding Approach to product Retrieval and Ranking, called NEAR$^2$, which can achieve up to $12$ times efficiency in embedding size at inference time while introducing no extra cost in training and improving performance in accuracy for various encoder-based Transformer models. We validate our approach using different loss functions for the retrieval and ranking task, including multiple negative ranking loss and online contrastive loss, on four different test sets with various IR challenges such as short and implicit queries. Our approach achieves an improved performance over a smaller embedding dimension, compared to any existing models. △ Less

Submitted 24 June, 2025; originally announced June 2025.

Comments: This paper is accepted to the 2025 SIGIR Workshop on eCommerce

arXiv:2506.15526 [pdf, ps, other]

Null infinity as an inverted extremal horizon: Matching an infinite set of conserved quantities for gravitational perturbations

Authors: Shreyansh Agrawal, Panagiotis Charalambous, Laura Donnay

Abstract: Every spacetime that is asymptotically flat near null infinity can be conformally mapped via a spatial inversion onto the geometry around an extremal, non-rotating and non-expanding horizon. We set up a dictionary for this geometric duality, connecting the geometry and physics near null infinity to those near the dual horizon. We then study its physical implications for conserved quantities for ex… ▽ More Every spacetime that is asymptotically flat near null infinity can be conformally mapped via a spatial inversion onto the geometry around an extremal, non-rotating and non-expanding horizon. We set up a dictionary for this geometric duality, connecting the geometry and physics near null infinity to those near the dual horizon. We then study its physical implications for conserved quantities for extremal black holes, extending previously known results to the case of gravitational perturbations. In particular, we derive a tower of near-horizon gravitational charges that are exactly conserved and show their one-to-one matching with Newman-Penrose conserved quantities associated with gravitational perturbations of the extremal Reissner-Nordström black hole geometry. We furthermore demonstrate the physical relevance of spatial inversions for extremal Kerr-Newman black holes, even if the latter are notoriously not conformally isometric under such inversions. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: 46+5 pages, 2 figures

arXiv:2506.02546 [pdf, ps, other]

Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

Authors: Pengfei He, Zhenwei Dai, Xianfeng Tang, Yue Xing, Hui Liu, Jingying Zeng, Qiankun Peng, Shrivats Agrawal, Samarth Varshney, Suhang Wang, Jiliang Tang, Qi He

Abstract: Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a singl… ▽ More Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a single type of harmfulness rather than analyze it in a holistic approach from multiple trustworthiness perspectives. In this work, we propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness. Inspired by human communication literature[1], through systematically analyzing attention behaviors across six orthogonal trust dimensions, we find that certain attention heads in the LLM specialize in detecting specific types of violations. Leveraging these insights, A-Trust directly infers trustworthiness from internal attention patterns without requiring external prompts or verifiers. Building upon A-Trust, we develop a principled and efficient trust management system (TMS) for LLM-MAS, enabling both message-level and agent-level trust assessment. Experiments across diverse multi-agent settings and tasks demonstrate that applying our TMS significantly enhances robustness against malicious inputs. △ Less

Submitted 3 June, 2025; originally announced June 2025.

arXiv:2506.02206 [pdf, other]

Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation

Authors: Chengyang Peng, Zhihao Zhang, Shiting Gong, Sankalp Agrawal, Keith A. Redmill, Ayonga Hereid

Abstract: Safe and real-time navigation is fundamental for humanoid robot applications. However, existing bipedal robot navigation frameworks often struggle to balance computational efficiency with the precision required for stable locomotion. We propose a novel hierarchical framework that continuously generates dynamic subgoals to guide the robot through cluttered environments. Our method comprises a high-… ▽ More Safe and real-time navigation is fundamental for humanoid robot applications. However, existing bipedal robot navigation frameworks often struggle to balance computational efficiency with the precision required for stable locomotion. We propose a novel hierarchical framework that continuously generates dynamic subgoals to guide the robot through cluttered environments. Our method comprises a high-level reinforcement learning (RL) planner for subgoal selection in a robot-centric coordinate system and a low-level Model Predictive Control (MPC) based planner which produces robust walking gaits to reach these subgoals. To expedite and stabilize the training process, we incorporate a data bootstrapping technique that leverages a model-based navigation approach to generate a diverse, informative dataset. We validate our method in simulation using the Agility Robotics Digit humanoid across multiple scenarios with random obstacles. Results show that our framework significantly improves navigation success rates and adaptability compared to both the original model-based method and other learning-based methods. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: 8 pages, 5 figures, 3 tables

arXiv:2506.02082 [pdf, ps, other]

doi 10.1109/SPCOM60851.2024.10631576

SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction

Authors: Saurabh Agrawal, Raj Gohil, Gopal Kumar Agrawal, Vikram C M, Kushal Verma

Abstract: Speech quality assessment is a critical process in selecting text-to-speech synthesis (TTS) or voice conversion models. Evaluation of voice synthesis can be done using objective metrics or subjective metrics. Although there are many objective metrics like the Perceptual Evaluation of Speech Quality (PESQ), Perceptual Objective Listening Quality Assessment (POLQA) or Short-Time Objective Intelligib… ▽ More Speech quality assessment is a critical process in selecting text-to-speech synthesis (TTS) or voice conversion models. Evaluation of voice synthesis can be done using objective metrics or subjective metrics. Although there are many objective metrics like the Perceptual Evaluation of Speech Quality (PESQ), Perceptual Objective Listening Quality Assessment (POLQA) or Short-Time Objective Intelligibility (STOI) but none of them is feasible in selecting the best model. On the other hand subjective metric like Mean Opinion Score is highly reliable but it requires a lot of manual efforts and are time-consuming. To counter the issues in MOS Evaluation, we have developed a novel model, Speaker Agnostic Latent Features (SALF)-Mean Opinion Score (MOS) which is a small-sized, end-to-end, highly generalized and scalable model for predicting MOS score on a scale of 5. We use the sequences of convolutions and stack them to get the latent features of the audio samples to get the best state-of-the-art results based on mean squared error (MSE), Linear Concordance Correlation coefficient (LCC), Spearman Rank Correlation Coefficient (SRCC) and Kendall Rank Correlation Coefficient (KTAU). △ Less

Submitted 2 June, 2025; originally announced June 2025.

Journal ref: 2024 International Conference on Signal Processing and Communications (SPCOM), 2024}, pages 1-5, 10631576

arXiv:2506.00917 [pdf, ps, other]

Q-learning with Posterior Sampling

Authors: Priyank Agrawal, Shipra Agrawal, Azmat Azati

Abstract: Bayesian posterior sampling techniques have demonstrated superior empirical performance in many exploration-exploitation settings. However, their theoretical analysis remains a challenge, especially in complex settings like reinforcement learning. In this paper, we introduce Q-Learning with Posterior Sampling (PSQL), a simple Q-learning-based algorithm that uses Gaussian posteriors on Q-values for… ▽ More Bayesian posterior sampling techniques have demonstrated superior empirical performance in many exploration-exploitation settings. However, their theoretical analysis remains a challenge, especially in complex settings like reinforcement learning. In this paper, we introduce Q-Learning with Posterior Sampling (PSQL), a simple Q-learning-based algorithm that uses Gaussian posteriors on Q-values for exploration, akin to the popular Thompson Sampling algorithm in the multi-armed bandit setting. We show that in the tabular episodic MDP setting, PSQL achieves a regret bound of $\tilde O(H^2\sqrt{SAT})$, closely matching the known lower bound of $Ω(H\sqrt{SAT})$. Here, S, A denote the number of states and actions in the underlying Markov Decision Process (MDP), and $T=KH$ with $K$ being the number of episodes and $H$ being the planning horizon. Our work provides several new technical insights into the core challenges in combining posterior sampling with dynamic programming and TD-learning-based RL algorithms, along with novel ideas for resolving those difficulties. We hope this will form a starting point for analyzing this efficient and important algorithmic technique in even more complex RL settings. △ Less

Submitted 1 June, 2025; originally announced June 2025.

Comments: 39 Pages

arXiv:2505.21048 [pdf, ps, other]

Entanglement Negativity of Spin-Orbit Correlations in a general Qubit-Qudit Setup

Authors: Sanskriti Agrawal, Raktim Abir

Abstract: We present the complete eigenvalue spectrum of the partially transposed density matrix for a pure bipartite quantum state acting on a generic $2 \otimes n$ Hilbert space. The spectrum contains four non-zero eigenvalues, as, \begin{eqnarray} λ_{1,2}=\pm \sqrt{A}, ~~~ λ_{3,4}= \frac{1}{2}(1\pm\sqrt{1-4 A}), \nonumber \end{eqnarray} where $A$ is the determinant of the reduced density matrix (traced o… ▽ More We present the complete eigenvalue spectrum of the partially transposed density matrix for a pure bipartite quantum state acting on a generic $2 \otimes n$ Hilbert space. The spectrum contains four non-zero eigenvalues, as, \begin{eqnarray} λ_{1,2}=\pm \sqrt{A}, ~~~ λ_{3,4}= \frac{1}{2}(1\pm\sqrt{1-4 A}), \nonumber \end{eqnarray} where $A$ is the determinant of the reduced density matrix (traced over the larger subspace). As $0 \leqslant A \leqslant1/4$, only one is negative among the four non-trivial eigenvalues. Within this qubit-qudit framework, we further studied the negativity as a measure of entanglement for the case of spin-orbit correlation of partons inside a proton. The entanglement negativity for spin-orbit correlations is found to be related to the gluon helicity PDF and the Hermitian angle of the associated Hilbert space for linearly polarized protons. △ Less

Submitted 10 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.13762

From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot

Authors: Siyuan Zhang, Yufei Zhang, Junlin Lyu, Sunil K. Agrawal

Abstract: This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structur… ▽ More This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance. △ Less

Submitted 30 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

Comments: This paper was originally submitted as a class project and included the name of a faculty member without prior permission. At the instructor's request, I am withdrawing the paper. The work may be resubmitted in the future after further development and testing

arXiv:2504.20106 [pdf, other]

Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors

Authors: Ren-Wei Liang, Chin-Ting Hsu, Chan-Hung Yu, Saransh Agrawal, Shih-Cheng Huang, Shang-Tse Chen, Kuan-Hao Huang, Shao-Hua Sun

Abstract: Ensuring that large language models (LLMs) are both helpful and harmless is a critical challenge, as overly strict constraints can lead to excessive refusals, while permissive models risk generating harmful content. Existing approaches, such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO), attempt to balance these trade-offs but suffer from performance… ▽ More Ensuring that large language models (LLMs) are both helpful and harmless is a critical challenge, as overly strict constraints can lead to excessive refusals, while permissive models risk generating harmful content. Existing approaches, such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO), attempt to balance these trade-offs but suffer from performance conflicts, limited controllability, and poor extendability. To address these issues, we propose Preference Vector, a novel framework inspired by task arithmetic. Instead of optimizing multiple preferences within a single objective, we train separate models on individual preferences, extract behavior shifts as preference vectors, and dynamically merge them at test time. This modular approach enables fine-grained, user-controllable preference adjustments and facilitates seamless integration of new preferences without retraining. Experiments show that our proposed Preference Vector framework improves helpfulness without excessive conservatism, allows smooth control over preference trade-offs, and supports scalable multi-preference alignment. △ Less

Submitted 27 April, 2025; originally announced April 2025.

Comments: 22 pages, 5 figures, 9 tables

arXiv:2504.19952 [pdf, ps, other]

On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper Bounds

Authors: Shubhada Agrawal, Aaditya Ramdas

Abstract: We prove two lower bounds for stopping times of sequential tests between general composite nulls and alternatives. The first lower bound is for the setting where the type-1 error level $α$ approaches zero, and equals $\log(1/α)$ divided by a certain infimum KL divergence, termed $\operatorname{KL_{inf}}$. The second lower bound applies to the setting where $α$ is fixed and… ▽ More We prove two lower bounds for stopping times of sequential tests between general composite nulls and alternatives. The first lower bound is for the setting where the type-1 error level $α$ approaches zero, and equals $\log(1/α)$ divided by a certain infimum KL divergence, termed $\operatorname{KL_{inf}}$. The second lower bound applies to the setting where $α$ is fixed and $\operatorname{KL_{inf}}$ approaches 0 (meaning that the null and alternative sets are not separated) and equals $c \operatorname{KL_{inf}}^{-1} \log \log \operatorname{KL_{inf}}^{-1}$ for a universal constant $c > 0$. We also provide a sufficient condition for matching the upper bounds and show that this condition is met in several special cases. Given past work, these upper and lower bounds are unsurprising in their form; our main contribution is the generality in which they hold, for example, not requiring reference measures or compactness of the classes. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 36 pages

arXiv:2504.12996 [pdf, other]

SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation

Authors: Saransh Agrawal, Kuan-Hao Huang

Abstract: Large language models (LLMs) frequently memorize sensitive information during training, posing risks when deploying publicly accessible models. Current machine unlearning methods struggle to selectively remove specific data associations without degrading overall model capabilities. This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which introduces a two-stage methodol… ▽ More Large language models (LLMs) frequently memorize sensitive information during training, posing risks when deploying publicly accessible models. Current machine unlearning methods struggle to selectively remove specific data associations without degrading overall model capabilities. This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which introduces a two-stage methodology that combines causal mediation analysis with layer-specific optimization. Through systematic causal tracing experiments on OLMo architectures (1B and 7B parameters), we identify the critical role of the first few transformer layers (layers 0-5) in storing subject-attribute associations within MLP modules. Building on this insight, we develop a constrained optimization approach that freezes upper layers while applying a novel joint loss function to lower layers-simultaneously maximizing forget set loss via output token cross-entropy penalties and minimizing retain set deviation through adaptive regularization. Our method achieves 2nd place in the 1B model track, demonstrating strong task performance while maintaining 88% of baseline MMLU accuracy. These results establish causal-informed layer optimization as a promising paradigm for efficient, precise unlearning in LLMs, offering a significant step forward in addressing data privacy concerns in AI systems. △ Less

Submitted 17 April, 2025; originally announced April 2025.

Comments: 8 pages, In Proceedings of The 19th International Workshop on Semantic Evaluation (SemEval), 2025

arXiv:2504.12140 [pdf, other]

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Authors: Miguel Moura Ramos, Patrick Fernandes, Sweta Agrawal, André F. T. Martins

Abstract: Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena across sentences and paragraphs. In this work, we propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality… ▽ More Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena across sentences and paragraphs. In this work, we propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality document-level data, which we curate and introduce as DocBlocks. Our approach supports multiple translation paradigms, including direct document-to-document and chunk-level translation, by integrating instructions both with and without surrounding context. This enables models to better capture cross-sentence dependencies while maintaining strong sentence-level translation performance. Experimental results show that incorporating multiple translation paradigms improves document-level translation quality and inference speed compared to prompting and agent-based methods. △ Less

Submitted 16 April, 2025; originally announced April 2025.

Comments: 9 pages, work-in-progress

arXiv:2504.11829 [pdf, other]

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Authors: Julia Kreutzer, Eleftheria Briakou, Sweta Agrawal, Marzieh Fadaee, Kocmi Tom

Abstract: Generation capabilities and language coverage of multilingual large language models (mLLMs) are advancing rapidly. However, evaluation practices for generative abilities of mLLMs are still lacking comprehensiveness, scientific rigor, and consistent adoption across research labs, which undermines their potential to meaningfully guide mLLM development. We draw parallels with machine translation (MT)… ▽ More Generation capabilities and language coverage of multilingual large language models (mLLMs) are advancing rapidly. However, evaluation practices for generative abilities of mLLMs are still lacking comprehensiveness, scientific rigor, and consistent adoption across research labs, which undermines their potential to meaningfully guide mLLM development. We draw parallels with machine translation (MT) evaluation, a field that faced similar challenges and has, over decades, developed transparent reporting standards and reliable evaluations for multilingual generative models. Through targeted experiments across key stages of the generative evaluation pipeline, we demonstrate how best practices from MT evaluation can deepen the understanding of quality differences between models. Additionally, we identify essential components for robust meta-evaluation of mLLMs, ensuring the evaluation methods themselves are rigorously assessed. We distill these insights into a checklist of actionable recommendations for mLLM research and development. △ Less

Submitted 17 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

arXiv:2504.11045 [pdf, other]

Neural Control Barrier Functions from Physics Informed Neural Networks

Authors: Shreenabh Agrawal, Manan Tayal, Aditya Singh, Shishir Kolathaya

Abstract: As autonomous systems become increasingly prevalent in daily life, ensuring their safety is paramount. Control Barrier Functions (CBFs) have emerged as an effective tool for guaranteeing safety; however, manually designing them for specific applications remains a significant challenge. With the advent of deep learning techniques, recent research has explored synthesizing CBFs using neural networks… ▽ More As autonomous systems become increasingly prevalent in daily life, ensuring their safety is paramount. Control Barrier Functions (CBFs) have emerged as an effective tool for guaranteeing safety; however, manually designing them for specific applications remains a significant challenge. With the advent of deep learning techniques, recent research has explored synthesizing CBFs using neural networks-commonly referred to as neural CBFs. This paper introduces a novel class of neural CBFs that leverages a physics-inspired neural network framework by incorporating Zubov's Partial Differential Equation (PDE) within the context of safety. This approach provides a scalable methodology for synthesizing neural CBFs applicable to high-dimensional systems. Furthermore, by utilizing reciprocal CBFs instead of zeroing CBFs, the proposed framework allows for the specification of flexible, user-defined safe regions. To validate the effectiveness of the approach, we present case studies on three different systems: an inverted pendulum, autonomous ground navigation, and aerial navigation in obstacle-laden environments. △ Less

Submitted 15 April, 2025; originally announced April 2025.

Comments: 8 pages, 5 figures

arXiv:2504.10577 [pdf, ps, other]

Soft theorems and spontaneous symmetry breaking

Authors: Shreyansh Agrawal, Kevin Nguyen

Abstract: The soft photon and soft graviton theorems of Weinberg are known to derive from conservation laws associated with asymptotic symmetries. Within the corresponding classical theories, one often speaks of spontaneous symmetry breaking and vacuum degeneracy, but a genuine quantum description of this phenomenon has largely been lacking. Here we establish spontaneous breaking of asymptotic symmetries an… ▽ More The soft photon and soft graviton theorems of Weinberg are known to derive from conservation laws associated with asymptotic symmetries. Within the corresponding classical theories, one often speaks of spontaneous symmetry breaking and vacuum degeneracy, but a genuine quantum description of this phenomenon has largely been lacking. Here we establish spontaneous breaking of asymptotic symmetries and the existence of Goldstone `particles' using exclusively the language of quantum field theory. This is made possible through the reformulation of massless scattering theory in terms of carrollian conformal field theory, and the observation that soft theorems correspond to Ward identities of broken symmetries. A suitable version of Goldstone theorem shows that there must exist zero-momentum particles described by conformal fields on the celestial sphere, in agreement with the common lore. More specifically, these belong to unitary representations in the discrete series of the Lorentz group, and are therefore naturally equipped with logarithmic two-point functions. We discuss the relevance of these observations to the problem of infrared divergences that scattering amplitudes suffer from. △ Less

Submitted 26 May, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

Comments: 7 pages; v2: new discussion of the Goldstone two-pont function + added references

arXiv:2504.07583 [pdf, other]

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

Authors: Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis, André F. T. Martins, Graham Neubig

Abstract: Despite the steady progress in machine translation evaluation, existing automatic metrics struggle to capture how well meaning is preserved beyond sentence boundaries. We posit that reliance on a single intrinsic quality score, trained to mimic human judgments, might be insufficient for evaluating translations of long, complex passages, and a more ``pragmatic'' approach that assesses how accuratel… ▽ More Despite the steady progress in machine translation evaluation, existing automatic metrics struggle to capture how well meaning is preserved beyond sentence boundaries. We posit that reliance on a single intrinsic quality score, trained to mimic human judgments, might be insufficient for evaluating translations of long, complex passages, and a more ``pragmatic'' approach that assesses how accurately key information is conveyed by a translation in context is needed. We introduce TREQA (Translation Evaluation via Question-Answering), a framework that extrinsically evaluates translation quality by assessing how accurately candidate translations answer reading comprehension questions that target key information in the original source or reference texts. In challenging domains that require long-range understanding, such as literary texts, we show that TREQA is competitive with and, in some cases, outperforms state-of-the-art neural and LLM-based metrics in ranking alternative paragraph-level translations, despite never being explicitly optimized to correlate with human judgments. Furthermore, the generated questions and answers offer interpretability: empirical analysis shows that they effectively target translation errors identified by experts in evaluated datasets. Our code is available at https://github.com/deep-spin/treqa △ Less

Submitted 11 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

arXiv:2503.21400 [pdf, ps, other]

Lattice Based Crypto breaks in a Superposition of Spacetimes

Authors: Divesh Aggarwal, Shashwat Agrawal, Rajendra Kumar

Abstract: We explore the computational implications of a superposition of spacetimes, a phenomenon hypothesized in quantum gravity theories. This was initiated by Shmueli (2024) where the author introduced the complexity class $\mathbf{BQP^{OI}}$ consisting of promise problems decidable by quantum polynomial time algorithms with access to an oracle for computing order interference. In this work, it was show… ▽ More We explore the computational implications of a superposition of spacetimes, a phenomenon hypothesized in quantum gravity theories. This was initiated by Shmueli (2024) where the author introduced the complexity class $\mathbf{BQP^{OI}}$ consisting of promise problems decidable by quantum polynomial time algorithms with access to an oracle for computing order interference. In this work, it was shown that the Graph Isomorphism problem and the Gap Closest Vector Problem (with approximation factor $\mathcal{O}(n^{3/2})$) are in $\mathbf{BQP^{OI}}$. We extend this result by showing that the entire complexity class $\mathbf{SZK}$ (Statistical Zero Knowledge) is contained within $\mathbf{BQP^{OI}}$. This immediately implies that the security of numerous lattice based cryptography schemes will be compromised in a computational model based on superposition of spacetimes, since these often rely on the hardness of the Learning with Errors problem, which is in $\mathbf{SZK}$. △ Less

Submitted 1 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.16481 [pdf, other]

Pedestrians and Robots: A Novel Dataset for Learning Distinct Social Navigation Forces

Authors: Subham Agrawal, Nico Ostermann-Myrau, Nils Dengler, Maren Bennewitz

Abstract: The increasing use of robots in human-centric public spaces such as shopping malls, sidewalks, and hospitals, requires understanding of how pedestrians respond to their presence. However, existing research lacks comprehensive datasets that capture the full range of pedestrian behaviors, e.g., including avoidance, neutrality, and attraction in the presence of robots. Such datasets can be used to ef… ▽ More The increasing use of robots in human-centric public spaces such as shopping malls, sidewalks, and hospitals, requires understanding of how pedestrians respond to their presence. However, existing research lacks comprehensive datasets that capture the full range of pedestrian behaviors, e.g., including avoidance, neutrality, and attraction in the presence of robots. Such datasets can be used to effectively learn models capable of accurately predicting diverse responses of pedestrians to robot presence, which are crucial for advancing robot navigation strategies and optimizing pedestrian-aware motion planning. In this paper, we address these challenges by collecting a novel dataset of pedestrian motion in two outdoor locations under three distinct conditions, i.e., no robot presence, a stationary robot, and a moving robot. Thus, unlike existing datasets, ours explicitly encapsulates variations in pedestrian behavior across the different robot conditions. Using our dataset, we propose a novel Neural Social Robot Force Model (NSRFM), an extension of the traditional Social Force Model that integrates neural networks and robot-induced forces to better predict pedestrian behavior in the presence of robots. We validate the NSRFM by comparing its generated trajectories on different real-world datasets. Furthermore, we implemented it in simulation to enable the learning and benchmarking of robot navigation strategies based on their impact on pedestrian movement. Our results demonstrate the model's effectiveness in replicating real-world pedestrian reactions and its its utility in developing, evaluating, and benchmarking social robot navigation algorithms. △ Less

Submitted 5 March, 2025; originally announced March 2025.

arXiv:2503.11855 [pdf, other]

Learning-based Estimation of Forward Kinematics for an Orthotic Parallel Robotic Mechanism

Authors: Jingzong Zhou, Yuhan Zhu, Xiaobin Zhang, Sunil Agrawal, Konstantinos Karydis

Abstract: This paper introduces a 3D parallel robot with three identical five-degree-of-freedom chains connected to a circular brace end-effector, aimed to serve as an assistive device for patients with cervical spondylosis. The inverse kinematics of the system is solved analytically, whereas learning-based methods are deployed to solve the forward kinematics. The methods considered herein include a Koopman… ▽ More This paper introduces a 3D parallel robot with three identical five-degree-of-freedom chains connected to a circular brace end-effector, aimed to serve as an assistive device for patients with cervical spondylosis. The inverse kinematics of the system is solved analytically, whereas learning-based methods are deployed to solve the forward kinematics. The methods considered herein include a Koopman operator-based approach as well as a neural network-based approach. The task is to predict the position and orientation of end-effector trajectories. The dataset used to train these methods is based on the analytical solutions derived via inverse kinematics. The methods are tested both in simulation and via physical hardware experiments with the developed robot. Results validate the suitability of deploying learning-based methods for studying parallel mechanism forward kinematics that are generally hard to resolve analytically. △ Less

Submitted 14 March, 2025; originally announced March 2025.

arXiv:2503.07970 [pdf, other]

Sustaining Human Agency, Attending to Its Cost: An Investigation into Generative AI Design for Non-Native Speakers' Language Use

Authors: Yimin Xiao, Cartor Hancock, Sweta Agrawal, Nikita Mehandru, Niloufar Salehi, Marine Carpuat, Ge Gao

Abstract: AI systems and tools today can generate human-like expressions on behalf of people. It raises the crucial question about how to sustain human agency in AI-mediated communication. We investigated this question in the context of machine translation (MT) assisted conversations. Our participants included 45 dyads. Each dyad consisted of one new immigrant in the United States, who leveraged MT for Engl… ▽ More AI systems and tools today can generate human-like expressions on behalf of people. It raises the crucial question about how to sustain human agency in AI-mediated communication. We investigated this question in the context of machine translation (MT) assisted conversations. Our participants included 45 dyads. Each dyad consisted of one new immigrant in the United States, who leveraged MT for English information seeking as a non-native speaker, and one local native speaker, who acted as the information provider. Non-native speakers could influence the English production of their message in one of three ways: labeling the quality of MT outputs, regular post-editing without additional hints, or augmented post-editing with LLM-generated hints. Our data revealed a greater exercise of non-native speakers' agency under the two post-editing conditions. This benefit, however, came at a significant cost to the dyadic-level communication performance. We derived insights for MT and other generative AI design from our findings. △ Less

Submitted 10 March, 2025; originally announced March 2025.

arXiv:2503.04828 [pdf, other]

Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications

Authors: Vishakha Agrawal, Archie Chaudhury, Shreya Agrawal

Abstract: While Large Language Models (LLMs) are fundamentally next-token prediction systems, their practical applications extend far beyond this basic function. From natural language processing and text generation to conversational assistants and software use, LLMs have numerous use-cases, and have already acquired a significant degree of enterprise adoption. To evaluate such models, static evaluation data… ▽ More While Large Language Models (LLMs) are fundamentally next-token prediction systems, their practical applications extend far beyond this basic function. From natural language processing and text generation to conversational assistants and software use, LLMs have numerous use-cases, and have already acquired a significant degree of enterprise adoption. To evaluate such models, static evaluation datasets, consisting of a set of prompts and their corresponding ground truths, are often used to benchmark the efficacy of the model for a particular task. In this paper, we provide the basis for a more comprehensive evaluation framework, based upon a traditional game and tool-based architecture that enables a more overarching measurement of a model's capabilities. For simplicity, we provide a generalized foundation that can be extended, without significant alteration, to numerous scenarios, from specific use cases such as supply chain management or financial reasoning, to abstract measurements such as ethics or safety. △ Less

Submitted 5 March, 2025; originally announced March 2025.

arXiv:2503.02128 [pdf, other]

Aerial Infrared Health Monitoring of Solar Photovoltaic Farms at Scale

Authors: Isaac Corley, Conor Wallace, Sourav Agrawal, Burton Putrah, Jonathan Lwowski

Abstract: Solar photovoltaic (PV) farms represent a major source of global renewable energy generation, yet their true operational efficiency often remains unknown at scale. In this paper, we present a comprehensive, data-driven framework for large-scale airborne infrared inspection of North American solar installations. Leveraging high-resolution thermal imagery, we construct and curate a geographically di… ▽ More Solar photovoltaic (PV) farms represent a major source of global renewable energy generation, yet their true operational efficiency often remains unknown at scale. In this paper, we present a comprehensive, data-driven framework for large-scale airborne infrared inspection of North American solar installations. Leveraging high-resolution thermal imagery, we construct and curate a geographically diverse dataset encompassing thousands of PV sites, enabling machine learning-based detection and localization of defects that are not detectable in the visible spectrum. Our pipeline integrates advanced image processing, georeferencing, and airborne thermal infrared anomaly detection to provide rigorous estimates of performance losses. We highlight practical considerations in aerial data collection, annotation methodologies, and model deployment across a wide range of environmental and operational conditions. Our work delivers new insights into the reliability of large-scale solar assets and serves as a foundation for ongoing research on performance trends, predictive maintenance, and scalable analytics in the renewable energy sector. △ Less

Submitted 3 March, 2025; originally announced March 2025.

arXiv:2503.01946 [pdf, other]

Water dissociation and rotational broadening in the atmosphere of KELT-20 b from high-resolution spectroscopy

Authors: Luke Finnerty, Yinzi Xin, Jerry W. Xuan, Julie Inglis, Michael P. Fitzgerald, Shubh Agrawal, Ashley Baker, Randall Bartos, Geoffrey A. Blake, Benjamin Calvin, Sylvain Cetre, Jacques-Robert Delorme, Greg Doppmann, Daniel Echeverri, Katelyn Horstman, Chih-Chun Hsu, Nemanja Jovanovic, Joshua Liberman, Ronald A. López, Emily C. Martin, Dimitri Mawet, Evan Morris, Jacklyn Pezzato, Jean-Baptiste Ruffio, Ben Sappey , et al. (7 additional authors not shown)

Abstract: We present atmospheric retrievals from Keck/KPIC phase II observations of the ultra-hot Jupiter KELT-20/MASCARA-2~b. Previous free retrievals of molecular abundances for ultra-hot Jupiters have been impacted by significant model biases due to variations in vertical abundance profiles, which we address by including molecular dissociation into our retrieval framework as an additional free parameter.… ▽ More We present atmospheric retrievals from Keck/KPIC phase II observations of the ultra-hot Jupiter KELT-20/MASCARA-2~b. Previous free retrievals of molecular abundances for ultra-hot Jupiters have been impacted by significant model biases due to variations in vertical abundance profiles, which we address by including molecular dissociation into our retrieval framework as an additional free parameter. We measure the abundance of CO ($\rm \log CO_{MMR} = -2.5^{+0.6}_{-0.5}$) and obtain a lower limit on the abundance of H$_2$O ($\rm \log H{_2}O_{MMR} = -1.5^{+0.8}_{-1.0}$, $>-3.0$ at 95\% confidence) in the atmosphere of \keltb. These abundances yield an atmospheric $\rm C/O = 0.1^{+0.4}_{-0.1}$ ($\rm C/O < 0.9$ at 95\% confidence) and suggest a metallicity approximately solar to $10\times$ solar. H$_2$O is dissociated at pressures below $\log P_{\rm H_2O} = -1.2^{+0.5}_{-0.7}$ bar, roughly consistent with predictions from chemical equilibrium models, and suggesting that the retrieved composition is not a result of assumptions about the vertical mixing profiles. We also constrain the rotational velocity of \keltb\ to $v\sin i = 7.5\pm0.7$ \kms, suggesting the presence of a jet comparable to the sound speed in the direction of the planet's rotation, assuming the actual rotation of the planet is tidally locked. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: 24 pages, 2 tables, 10 figures, accepted in AJ

arXiv:2502.20393 [pdf, other]

Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models

Authors: Susmit Agrawal, Deepika Vemuri, Sri Siddarth Chakaravarthy P, Vineeth N. Balasubramanian

Abstract: Concept-based methods have emerged as a promising direction to develop interpretable neural networks in standard supervised settings. However, most works that study them in incremental settings assume either a static concept set across all experiences or assume that each experience relies on a distinct set of concepts. In this work, we study concept-based models in a more realistic, dynamic settin… ▽ More Concept-based methods have emerged as a promising direction to develop interpretable neural networks in standard supervised settings. However, most works that study them in incremental settings assume either a static concept set across all experiences or assume that each experience relies on a distinct set of concepts. In this work, we study concept-based models in a more realistic, dynamic setting where new classes may rely on older concepts in addition to introducing new concepts themselves. We show that concepts and classes form a complex web of relationships, which is susceptible to degradation and needs to be preserved and augmented across experiences. We introduce new metrics to show that existing concept-based models cannot preserve these relationships even when trained using methods to prevent catastrophic forgetting, since they cannot handle forgetting at concept, class, and concept-class relationship levels simultaneously. To address these issues, we propose a novel method - MuCIL - that uses multimodal concepts to perform classification without increasing the number of trainable parameters across experiences. The multimodal concepts are aligned to concepts provided in natural language, making them interpretable by design. Through extensive experimentation, we show that our approach obtains state-of-the-art classification performance compared to other concept-based models, achieving over 2$\times$ the classification performance in some cases. We also study the ability of our model to perform interventions on concepts, and show that it can localize visual concepts in input images, providing post-hoc interpretations. △ Less

Submitted 27 February, 2025; originally announced February 2025.

Comments: 8 pages of main text, 6 figures in main text, 11 pages of Appendix, published in AAAI 2025

arXiv:2502.17112 [pdf]

Defects in the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$ MOS system and the effect of thermal treatments

Authors: Khushabu. S. Agrawal, Paolo LaTorraca, Jonas Valentijn, Roberta Hawkins, Adam A. Gruszecki, Joy Roy, Vasily Lebedev, Lewys Jones, Robert M. Wallace, Chadwin D. Young, Paul K. Hurley, Karim Cherkaoui

Abstract: We have investigated the properties of the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$/Cr/Au MOS (metal-oxide-semiconductor) system after annealing (450$^\circ$C) in different ambient conditions (forming gas, N$_2$ and O$_2$). Defect properties have been analyzed using an approach combining experimental impedance measurements with physics-based simulations of the capacitance-voltage (C-V) and conductance-v… ▽ More We have investigated the properties of the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$/Cr/Au MOS (metal-oxide-semiconductor) system after annealing (450$^\circ$C) in different ambient conditions (forming gas, N$_2$ and O$_2$). Defect properties have been analyzed using an approach combining experimental impedance measurements with physics-based simulations of the capacitance-voltage (C-V) and conductance-voltage (G-V) characteristics of $β$-Ga$_2$O$_3$/HfO$_2$ MOS capacitors. This approach enabled us to detect two defect bands in HfO$_2$ characterized by thermal ionization energies of ~1.1eV (acceptor-like) and ~2eV (donor-like) attributed to a polaronic self-trapping state and an oxygen vacancy in HfO$_2$, respectively. This study demonstrates how thermal treatments affect the energy distributions and densities of the observed defects. The adopted methodology also enabled the extraction of the spatial distribution of defects across the HfO$_2$ thickness and Cr/HfO$_2$ interface. The high concentration of oxygen vacancies close to the Cr/HfO$_2$ interface extracted from experimental and simulated electrical data is confirmed by in-situ XPS analysis which shows how Cr is scavenging oxygen from the HfO$_2$ and creating the donor band confined near the Cr/HfO$_2$ interface. This donor band density is observed to be reduced after annealing as per simulation and unchanged for different annealing conditions. We speculate this may be due to the formation of dense films and polyforms of HfO$_2$ under different ambient as revealed by high-resolution TEM images. △ Less

Submitted 24 February, 2025; originally announced February 2025.

Comments: Main article: 23 pages, 6 figures, Supporting information:7 pages, 5 Figures

arXiv:2502.12701 [pdf, other]

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

Authors: António Farinhas, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, André F. T. Martins

Abstract: Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimat… ▽ More Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction (30% to 50%) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation. △ Less

Submitted 18 February, 2025; originally announced February 2025.

Comments: Preprint

arXiv:2502.12393 [pdf, other]

Time Series Treatment Effects Analysis with Always-Missing Controls

Authors: Juan Shu, Qiyu Han, George Chen, Xihao Cao, Kangming Luo, Dan Pallotta, Shivam Agrawal, Yuping Lu, Xiaoyu Zhang, Jawad Mansoor, Jyoti Anand

Abstract: Estimating treatment effects in time series data presents a significant challenge, especially when the control group is always unobservable. For example, in analyzing the effects of Christmas on retail sales, we lack direct observation of what would have occurred in late December without the Christmas impact. To address this, we try to recover the control group in the event period while accounting… ▽ More Estimating treatment effects in time series data presents a significant challenge, especially when the control group is always unobservable. For example, in analyzing the effects of Christmas on retail sales, we lack direct observation of what would have occurred in late December without the Christmas impact. To address this, we try to recover the control group in the event period while accounting for confounders and temporal dependencies. Experimental results on the M5 Walmart retail sales data demonstrate robust estimation of the potential outcome of the control group as well as accurate predicted holiday effect. Furthermore, we provided theoretical guarantees for the estimated treatment effect, proving its consistency and asymptotic normality. The proposed methodology is applicable not only to this always-missing control scenario but also in other conventional time series causal inference settings. △ Less

Submitted 17 February, 2025; originally announced February 2025.

arXiv:2502.10295 [pdf, other]

Fenchel-Young Variational Learning

Authors: Sophia Sklaviadis, Sweta Agrawal, Antonio Farinhas, Andre Martins, Mario Figueiredo

Abstract: From a variational perspective, many statistical learning criteria involve seeking a distribution that balances empirical risk and regularization. In this paper, we broaden this perspective by introducing a new general class of variational methods based on Fenchel-Young (FY) losses, treated as divergences that generalize (and encompass) the familiar Kullback-Leibler divergence at the core of class… ▽ More From a variational perspective, many statistical learning criteria involve seeking a distribution that balances empirical risk and regularization. In this paper, we broaden this perspective by introducing a new general class of variational methods based on Fenchel-Young (FY) losses, treated as divergences that generalize (and encompass) the familiar Kullback-Leibler divergence at the core of classical variational learning. Our proposed formulation -- FY variational learning -- includes as key ingredients new notions of FY free energy, FY evidence, FY evidence lower bound, and FY posterior. We derive alternating minimization and gradient backpropagation algorithms to compute (or lower bound) the FY evidence, which enables learning a wider class of models than previous variational formulations. This leads to generalized FY variants of classical algorithms, such as an FY expectation-maximization (FYEM) algorithm, and latent-variable models, such as an FY variational autoencoder (FYVAE). Our new methods are shown to be empirically competitive, often outperforming their classical counterparts, and most importantly, to have qualitatively novel features. For example, FYEM has an adaptively sparse E-step, while the FYVAE can support models with sparse observations and sparse posteriors. △ Less

Submitted 14 February, 2025; originally announced February 2025.

Comments: Under review

arXiv:2502.02085 [pdf, other]

A New Rejection Sampling Approach to $k$-$\mathtt{means}$++ With Improved Trade-Offs

Authors: Poojan Shah, Shashwat Agrawal, Ragesh Jaiswal

Abstract: The $k$-$\mathtt{means}$++ seeding algorithm (Arthur & Vassilvitskii, 2007) is widely used in practice for the $k$-means clustering problem where the goal is to cluster a dataset $\mathcal{X} \subset \mathbb{R} ^d$ into $k$ clusters. The popularity of this algorithm is due to its simplicity and provable guarantee of being $O(\log k)$ competitive with the optimal solution in expectation. However,… ▽ More The $k$-$\mathtt{means}$++ seeding algorithm (Arthur & Vassilvitskii, 2007) is widely used in practice for the $k$-means clustering problem where the goal is to cluster a dataset $\mathcal{X} \subset \mathbb{R} ^d$ into $k$ clusters. The popularity of this algorithm is due to its simplicity and provable guarantee of being $O(\log k)$ competitive with the optimal solution in expectation. However, its running time is $O(|\mathcal{X}|kd)$, making it expensive for large datasets. In this work, we present a simple and effective rejection sampling based approach for speeding up $k$-$\mathtt{means}$++. Our first method runs in time $\tilde{O}(\mathtt{nnz} (\mathcal{X}) + βk^2d)$ while still being $O(\log k )$ competitive in expectation. Here, $β$ is a parameter which is the ratio of the variance of the dataset to the optimal $k$-$\mathtt{means}$ cost in expectation and $\tilde{O}$ hides logarithmic factors in $k$ and $|\mathcal{X}|$. Our second method presents a new trade-off between computational cost and solution quality. It incurs an additional scale-invariant factor of $ k^{-Ω( m/β)} \operatorname{Var} (\mathcal{X})$ in addition to the $O(\log k)$ guarantee of $k$-$\mathtt{means}$++ improving upon a result of (Bachem et al, 2016a) who get an additional factor of $m^{-1}\operatorname{Var}(\mathcal{X})$ while still running in time $\tilde{O}(\mathtt{nnz}(\mathcal{X}) + mk^2d)$. We perform extensive empirical evaluations to validate our theoretical results and to show the effectiveness of our approach on real datasets. △ Less

Submitted 4 February, 2025; originally announced February 2025.

arXiv:2501.16573 [pdf, other]

Optimization Landscapes Learned: Proxy Networks Boost Convergence in Physics-based Inverse Problems

Authors: Girnar Goyal, Philipp Holl, Sweta Agrawal, Nils Thuerey

Abstract: Solving inverse problems in physics is central to understanding complex systems and advancing technologies in various fields. Iterative optimization algorithms, commonly used to solve these problems, often encounter local minima, chaos, or regions with zero gradients. This is due to their overreliance on local information and highly chaotic inverse loss landscapes governed by underlying partial di… ▽ More Solving inverse problems in physics is central to understanding complex systems and advancing technologies in various fields. Iterative optimization algorithms, commonly used to solve these problems, often encounter local minima, chaos, or regions with zero gradients. This is due to their overreliance on local information and highly chaotic inverse loss landscapes governed by underlying partial differential equations (PDEs). In this work, we show that deep neural networks successfully replicate such complex loss landscapes through spatio-temporal trajectory inputs. They also offer the potential to control the underlying complexity of these chaotic loss landscapes during training through various regularization methods. We show that optimizing on network-smoothened loss landscapes leads to improved convergence in predicting optimum inverse parameters over conventional momentum-based optimizers such as BFGS on multiple challenging problems. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Comments: Ongoing work

arXiv:2501.05078 [pdf, other]

Analyzing Memorization in Large Language Models through the Lens of Model Attribution

Authors: Tarun Ram Menta, Susmit Agrawal, Chirag Agarwal

Abstract: Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on posthoc analyses, such as extracting memorized content or developing memorization metrics, without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate me… ▽ More Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on posthoc analyses, such as extracting memorized content or developing memorization metrics, without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance. Using attribution techniques, we systematically intervene in the LLM architecture by bypassing attention modules at specific blocks while keeping other components like layer normalization and MLP transformations intact. We provide theorems analyzing our intervention mechanism from a mathematical view, bounding the difference in layer outputs with and without our attributions. Our theoretical and empirical analyses reveal that attention modules in deeper transformer blocks are primarily responsible for memorization, whereas earlier blocks are crucial for the models generalization and reasoning capabilities. We validate our findings through comprehensive experiments on different LLM families (Pythia and GPTNeo) and five benchmark datasets. Our insights offer a practical approach to mitigate memorization in LLMs while preserving their performance, contributing to safer and more ethical deployment in real world applications. △ Less

Submitted 9 January, 2025; originally announced January 2025.

arXiv:2501.04359 [pdf, other]

Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation

Authors: Terrance Yu-Hao Chen, Yulin Chen, Pontus Soederhaell, Sadrishya Agrawal, Kateryna Shapovalenko

Abstract: Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like… ▽ More Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like speech perception. This study attempts to address these challenges by employing variational autoencoders (VAEs) for EEG data augmentation to improve data quality and applying a state-of-the-art (SOTA) sequence-to-sequence deep learning architecture, originally successful in electromyography (EMG) tasks, to EEG-based speech decoding. Additionally, we adapt this architecture for word classification tasks. Using the Brennan dataset, which contains EEG recordings of subjects listening to narrated speech, we preprocess the data and evaluate both classification and sequence-to-sequence models for EEG-to-words/sentences tasks. Our experiments show that VAEs have the potential to reconstruct artificial EEG data for augmentation. Meanwhile, our sequence-to-sequence model achieves more promising performance in generating sentences compared to our classification model, though both remain challenging tasks. These findings lay the groundwork for future research on EEG speech perception decoding, with possible extensions to speech production tasks such as silent or imagined speech. △ Less

Submitted 8 January, 2025; originally announced January 2025.

Comments: 19 pages, 15 figures, 2 tables

MSC Class: 68T07; 92C55 ACM Class: H.5.2; I.2.6; J.3

arXiv:2412.16540 [pdf, other]

Prior2Posterior: Model Prior Correction for Long-Tailed Learning

Authors: S Divakar Bhat, Amit More, Mudit Soni, Surbhi Agrawal

Abstract: Learning-based solutions for long-tailed recognition face difficulties in generalizing on balanced test datasets. Due to imbalanced data prior, the learned \textit{a posteriori} distribution is biased toward the most frequent (head) classes, leading to an inferior performance on the least frequent (tail) classes. In general, the performance can be improved by removing such a bias by eliminating th… ▽ More Learning-based solutions for long-tailed recognition face difficulties in generalizing on balanced test datasets. Due to imbalanced data prior, the learned \textit{a posteriori} distribution is biased toward the most frequent (head) classes, leading to an inferior performance on the least frequent (tail) classes. In general, the performance can be improved by removing such a bias by eliminating the effect of imbalanced prior modeled using the number of class samples (frequencies). We first observe that the \textit{effective prior} on the classes, learned by the model at the end of the training, can differ from the empirical prior obtained using class frequencies. Thus, we propose a novel approach to accurately model the effective prior of a trained model using \textit{a posteriori} probabilities. We propose to correct the imbalanced prior by adjusting the predicted \textit{a posteriori} probabilities (Prior2Posterior: P2P) using the calculated prior in a post-hoc manner after the training, and show that it can result in improved model performance. We present theoretical analysis showing the optimality of our approach for models trained with naive cross-entropy loss as well as logit adjusted loss. Our experiments show that the proposed approach achieves new state-of-the-art (SOTA) on several benchmark datasets from the long-tail literature in the category of logit adjustment methods. Further, the proposed approach can be used to inspect any existing method to capture the \textit{effective prior} and remove any residual bias to improve its performance, post-hoc, without model retraining. We also show that by using the proposed post-hoc approach, the performance of many existing methods can be improved further. △ Less

Submitted 21 December, 2024; originally announced December 2024.

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

arXiv:2412.04552 [pdf, other]

True mass and atmospheric composition of the non-transiting hot Jupiter HD 143105 b

Authors: Luke Finnerty, Yinzi Xin, Jerry W. Xuan, Julie Inglis, Michael P Fitzgerald, Shubh Agrawal, Ashley Baker, Geoffrey A. Blake, Benjamin Calvin, Sylvain Cetre, Jacques-Robert Delorme, Greg Doppman, Daniel Echeverri, Katelyn Horstman, Chih-Chun Hsu, Nemanja Jovanovic, Joshua Liberman, Ronald A. López, Emily C. Martin, Dimitri Mawet, Evan Morris, Jacklyn Pezzato-Rovner, Jean-Baptiste Ruffio, Ben Sappey, Tobias Schofield , et al. (6 additional authors not shown)

Abstract: We present Keck/KPIC phase II $K$-band observations of the non-transiting hot Jupiter HD 143105 b. Using a cross-correlation approach, we make the first detection of the planetary atmosphere at $K_p = 185^{+11}_{-13}\rm km\ s^{-1}$ and an inferior conjunction time 2.5 hours before the previously-published ephemeris. The retrieved $K_p$ value, in combination with orbital period, mass of the host st… ▽ More We present Keck/KPIC phase II $K$-band observations of the non-transiting hot Jupiter HD 143105 b. Using a cross-correlation approach, we make the first detection of the planetary atmosphere at $K_p = 185^{+11}_{-13}\rm km\ s^{-1}$ and an inferior conjunction time 2.5 hours before the previously-published ephemeris. The retrieved $K_p$ value, in combination with orbital period, mass of the host star, and lack of transit detection, gives an orbital inclination of $78^{\circ+2}_{-12}$ and a true planet mass of 1.23$\pm0.10\rm\ M_J$. While the equilibrium temperature of HD 143105 b is in the transition regime between non-inverted and inverted atmospheres, our analysis strongly prefers a non-inverted atmosphere. Retrieval analysis indicates the atmosphere of HD 143105 b is cloud-free to approximately 1 bar and dominated by H$_2$O absorption ($\log \rm H_2O_{MMR} = -3.9^{+0.8}_{-0.5}$), placing only an upper limit on the CO abundance ($\log \rm CO_{MMR} < -3.7$ at 95% confidence). We place no constraints on the abundances of Fe, Mg, or $^{13}$CO. From these abundances, we place an upper limit on the carbon-to-oxygen ratio for HD 143105 b, $\rm C/O < 0.2$ at 95% confidence, and find the atmospheric metallicity is approximately $0.1\times$ solar. The low metallicity may be responsible for the lack of a thermal inversion, which at the temperature of HD 143105 b would likely require significant opacity from TiO and/or VO. With these results, HD 143105 b joins the small number of non-transiting hot Jupiters with detected atmospheres. △ Less

Submitted 5 December, 2024; originally announced December 2024.

Comments: 19 pages, 7 figures, 2 tables. Accepted in AJ

arXiv:2412.04205 [pdf, ps, other]

A Context-aware Framework for Translation-mediated Conversations

Authors: José Pombal, Sweta Agrawal, Patrick Fernandes, Emmanouil Zaranis, André F. T. Martins

Abstract: Automatic translation systems offer a powerful solution to bridge language barriers in scenarios where participants do not share a common language. However, these systems can introduce errors leading to misunderstandings and conversation breakdown. A key issue is that current systems fail to incorporate the rich contextual information necessary to resolve ambiguities and omitted details, resulting… ▽ More Automatic translation systems offer a powerful solution to bridge language barriers in scenarios where participants do not share a common language. However, these systems can introduce errors leading to misunderstandings and conversation breakdown. A key issue is that current systems fail to incorporate the rich contextual information necessary to resolve ambiguities and omitted details, resulting in literal, inappropriate, or misaligned translations. In this work, we present a framework to improve large language model-based translation systems by incorporating contextual information in bilingual conversational settings during training and inference. We validate our proposed framework on two task-oriented domains: customer chat and user-assistant interaction. Across both settings, the system produced by our framework-TowerChat-consistently results in better translations than state-of-the-art systems like GPT-4o and TowerInstruct, as measured by multiple automatic translation quality metrics on several language pairs. We also show that the resulting model leverages context in an intended and interpretable way, improving consistency between the conveyed message and the generated translations. △ Less

Submitted 29 June, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

arXiv:2412.01647 [pdf, other]

doi 10.1007/JHEP03(2025)208

Celestial $sw_{1+\infty}$ algebra in Einstein-Yang-Mills theory

Authors: Shreyansh Agrawal, Panagiotis Charalambous, Laura Donnay

Abstract: From a study of the subleading structure of the asymptotic equations of motion in Einstein-Yang-Mills theory, we construct charges that are conserved up to quadratic order in non-radiative vacuum. We then show that these higher spin charges obey the celestial $sw_{1+\infty}$ symmetry algebra found earlier from the OPE of positive-helicity conformally soft gluons and gravitons. From a study of the subleading structure of the asymptotic equations of motion in Einstein-Yang-Mills theory, we construct charges that are conserved up to quadratic order in non-radiative vacuum. We then show that these higher spin charges obey the celestial $sw_{1+\infty}$ symmetry algebra found earlier from the OPE of positive-helicity conformally soft gluons and gravitons. △ Less

Submitted 30 April, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: 31+15 pages

arXiv:2411.14983 [pdf, other]

Large sample scaling analysis of the Zig-Zag algorithm for Bayesian inference

Authors: Sanket Agrawal, Joris Bierkens, Gareth O. Roberts

Abstract: Piecewise deterministic Markov processes provide scalable methods for sampling from the posterior distributions in big data settings by admitting principled sub-sampling strategies that do not bias the output. An important example is the Zig-Zag process of [Ann. Stats. 47 (2019) 1288 - 1320] where clever sub-sampling has been shown to produce an essentially independent sample at a cost that does n… ▽ More Piecewise deterministic Markov processes provide scalable methods for sampling from the posterior distributions in big data settings by admitting principled sub-sampling strategies that do not bias the output. An important example is the Zig-Zag process of [Ann. Stats. 47 (2019) 1288 - 1320] where clever sub-sampling has been shown to produce an essentially independent sample at a cost that does not scale with the size of the data. However, sub-sampling also leads to slower convergence and poor mixing of the process, a behaviour which questions the promised scalability of the algorithm. We provide a large sample scaling analysis of the Zig-Zag process and its sub-sampling versions in settings of parametric Bayesian inference. In the transient phase of the algorithm, we show that the Zig-Zag trajectories are well approximated by the solution to a system of ODEs. These ODEs possess a drift in the direction of decreasing KL-divergence between the assumed model and the true distribution and are explicitly characterized in the paper. In the stationary phase, we give weak convergence results for different versions of the Zig-Zag process. Based on our results, we estimate that for large data sets of size n, using suitable control variates with sub-sampling in Zig-Zag, the algorithm costs O(1) to obtain an essentially independent sample; a computational speed-up of O(n) over the canonical version of Zig-Zag and other traditional MCMC methods △ Less

Submitted 22 November, 2024; originally announced November 2024.

Comments: 47 pages, 7 figues, 1 table

MSC Class: 62-08; 60F05; 62F15; 65C05

arXiv:2411.13690 [pdf, other]

Multi-Agent Best Arm Identification in Stochastic Linear Bandits

Authors: Sanjana Agrawal, Saúl A. Blanco

Abstract: We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we first consider multiple agents connected through a star network, interacting with a linear bandit instance in parallel. We then extend our analysis to arbitrary network topologies. The objective of the agents is to collaboratively identify the best arm… ▽ More We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we first consider multiple agents connected through a star network, interacting with a linear bandit instance in parallel. We then extend our analysis to arbitrary network topologies. The objective of the agents is to collaboratively identify the best arm of the given bandit instance with the help of a central server while minimizing the probability of error in best arm estimation. To this end, we propose two algorithms, MaLinBAI-Star and MaLinBAI-Gen for star networks and networks with arbitrary structure, respectively. Both algorithms utilize the technique of G-optimal design along with the successive elimination based strategy where agents share their knowledge through a central server at each communication round. We demonstrate, both theoretically and empirically, that our algorithms achieve exponentially decaying probability of error in the allocated time budget. Furthermore, experimental results on both synthetic and real-world data validate the effectiveness of our algorithms over the state-of-the art existing multi-agent algorithms. △ Less

Submitted 24 May, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

Comments: Updated algorithms, corrected proofs, fixed typos

MSC Class: 93E35 ACM Class: I.2.6

arXiv:2411.12497 [pdf, ps, other]

doi 10.1103/PhysRevD.111.034022

Small-$x$ evolution of dipole amplitude in momentum space: forward--off-forward correspondence

Authors: Sanskriti Agrawal, Raktim Abir

Abstract: We have shown that the small-$x$ evolution of the off-forward leading-log dipole scattering amplitudes, both pomeron and odderon, in the momentum space can be completely determined by the evolution of the respective forward amplitudes, with rescaled momenta. In position space, if there is translation symmetry (assumption of a large nucleus), the dipole cross section depends on the positions of qua… ▽ More We have shown that the small-$x$ evolution of the off-forward leading-log dipole scattering amplitudes, both pomeron and odderon, in the momentum space can be completely determined by the evolution of the respective forward amplitudes, with rescaled momenta. In position space, if there is translation symmetry (assumption of a large nucleus), the dipole cross section depends on the positions of quarks and anti-quarks only through their separation. The present study is an equivalent proposition in the momentum space -- where translation symmetry in momentum bifurcates the amplitudes into two translationally symmetric functions along the ${\bf k}$ line in the ${\bf k}-{\bf Δ}$ plane. It also shows that high energy evolutions of dipole GTMDs can be achieved only by studying the evolution of dipole TMDs at small-$x$. △ Less

Submitted 23 November, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

Journal ref: Phys.Rev.D 111 (2025) 3, 034022

arXiv:2411.11937 [pdf, other]

Value Imprint: A Technique for Auditing the Human Values Embedded in RLHF Datasets

Authors: Ike Obi, Rohan Pant, Srishti Shekhar Agrawal, Maham Ghazanfar, Aaron Basiletti

Abstract: LLMs are increasingly fine-tuned using RLHF datasets to align them with human preferences and values. However, very limited research has investigated which specific human values are operationalized through these datasets. In this paper, we introduce Value Imprint, a framework for auditing and classifying the human values embedded within RLHF datasets. To investigate the viability of this framework… ▽ More LLMs are increasingly fine-tuned using RLHF datasets to align them with human preferences and values. However, very limited research has investigated which specific human values are operationalized through these datasets. In this paper, we introduce Value Imprint, a framework for auditing and classifying the human values embedded within RLHF datasets. To investigate the viability of this framework, we conducted three case study experiments by auditing the Anthropic/hh-rlhf, OpenAI WebGPT Comparisons, and Alpaca GPT-4-LLM datasets to examine the human values embedded within them. Our analysis involved a two-phase process. During the first phase, we developed a taxonomy of human values through an integrated review of prior works from philosophy, axiology, and ethics. Then, we applied this taxonomy to annotate 6,501 RLHF preferences. During the second phase, we employed the labels generated from the annotation as ground truth data for training a transformer-based machine learning model to audit and classify the three RLHF datasets. Through this approach, we discovered that information-utility values, including Wisdom/Knowledge and Information Seeking, were the most dominant human values within all three RLHF datasets. In contrast, prosocial and democratic values, including Well-being, Justice, and Human/Animal Rights, were the least represented human values. These findings have significant implications for developing language models that align with societal values and norms. We contribute our datasets to support further research in this area. △ Less

Submitted 18 November, 2024; originally announced November 2024.

arXiv:2411.05986 [pdf, other]

Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

Authors: Miguel Moura Ramos, Tomás Almeida, Daniel Vareta, Filipe Azevedo, Sweta Agrawal, Patrick Fernandes, André F. T. Martins

Abstract: Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem -- the model rece… ▽ More Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem -- the model receives a single score for the entire sentence. To address this, we propose a novel approach that leverages fine-grained, token-level quality assessments along with error severity levels using RL methods. Specifically, we use xCOMET, a state-of-the-art quality estimation system, as our token-level reward model. We conduct experiments on small and large translation datasets with standard encoder-decoder and large language models-based machine translation systems, comparing the impact of sentence-level versus fine-grained reward signals on translation quality. Our results show that training with token-level rewards improves translation quality across language pairs over baselines according to both automatic and human evaluation. Furthermore, token-level reward optimization improves training stability, evidenced by a steady increase in mean rewards over training epochs. △ Less

Submitted 16 April, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

Comments: 12 pages, work-in-progress

arXiv:2410.19500 [pdf, other]

Microscopy of bosonic charge carriers in staggered magnetic fields

Authors: Annabelle Bohrdt, David Wei, Daniel Adler, Kritsana Srakaew, Suchita Agrawal, Pascal Weckesser, Immanuel Bloch, Fabian Grusdt, Johannes Zeiher

Abstract: The interplay of spin and charge degrees of freedom is believed to underlie various unresolved phenomena in strongly correlated systems. Quantum simulators based on neutral atoms provide an excellent testbed for investigating such phenomena and resolving their microscopic origins. Up to now, the majority of experimental and theoretical studies has focused on systems with fermionic exchange statist… ▽ More The interplay of spin and charge degrees of freedom is believed to underlie various unresolved phenomena in strongly correlated systems. Quantum simulators based on neutral atoms provide an excellent testbed for investigating such phenomena and resolving their microscopic origins. Up to now, the majority of experimental and theoretical studies has focused on systems with fermionic exchange statistics. Here we expand the existing cold atom toolbox through the use of negative temperature states, enabling us to realize an antiferromagnetic, bosonic $t-J$ model in two spatial dimensions, subject to a strong staggered magnetic field in a quantum gas microscope. Through comparison of the spreading dynamics of a single hole in a Néel versus a spin-polarized initial state, we establish the relevance of memory effects resulting from the buildup of strong spin-charge correlations in the dynamics of charge carriers in antiferromagnets. We further numerically predict rich dynamics of pairs of doped holes, which we demonstrate to be bound by a similar memory effect, while their center-of-mass can expand freely. Our work paves the way for the systematic exploration of the effect of antiferromagnetic spin ordering on the properties of individual charge carriers as well as finite doping phases: Our study demonstrates that the staggered field can be used to single out the effect of antiferromagnetism and holds the prospect to prepare low-temperature states in the near future. △ Less

Submitted 25 October, 2024; originally announced October 2024.

Comments: 9+3 pages, 4+5 figures

arXiv:2410.18351 [pdf, other]

AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability

Authors: Sudhanshu Agrawal, Wonseok Jeon, Mingu Lee

Abstract: Speculative decoding is a powerful technique that attempts to circumvent the autoregressive constraint of modern Large Language Models (LLMs). The aim of speculative decoding techniques is to improve the average inference time of a large, target model without sacrificing its accuracy, by using a more efficient draft model to propose draft tokens which are then verified in parallel. The number of d… ▽ More Speculative decoding is a powerful technique that attempts to circumvent the autoregressive constraint of modern Large Language Models (LLMs). The aim of speculative decoding techniques is to improve the average inference time of a large, target model without sacrificing its accuracy, by using a more efficient draft model to propose draft tokens which are then verified in parallel. The number of draft tokens produced in each drafting round is referred to as the draft length and is often a static hyperparameter chosen based on the acceptance rate statistics of the draft tokens. However, setting a static draft length can negatively impact performance, especially in scenarios where drafting is expensive and there is a high variance in the number of tokens accepted. Adaptive Entropy-based Draft Length (AdaEDL) is a simple, training and parameter-free criteria which allows for early stopping of the token drafting process by approximating a lower bound on the expected acceptance probability of the drafted token based on the currently observed entropy of the drafted logits. We show that AdaEDL consistently outperforms static draft-length speculative decoding by 10%-57% as well as other training-free draft-stopping techniques by upto 10% in a variety of settings and datasets. At the same time, we show that AdaEDL is more robust than these techniques and preserves performance in high-sampling-temperature scenarios. Since it is training-free, in contrast to techniques that rely on the training of dataset-specific draft-stopping predictors, AdaEDL can seamlessly be integrated into a variety of pre-existing LLM systems. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: Workshop on Efficient Natural Language and Signal Processing at NeurIPS 2024

arXiv:2410.17709 [pdf, other]

Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure

Authors: Chaoyun Zhang, Randolph Yao, Si Qin, Ze Li, Shekhar Agrawal, Binit R. Mishra, Tri Tran, Minghua Ma, Qingwei Lin, Murali Chintalapati, Dongmei Zhang

Abstract: The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored… ▽ More The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored to recommending mitigation actions for unhealthy node in cloud systems to minimize virtual machine downtime and interruptions during unhealthy events. It employs double machine learning combined with causal forest to produce precise and reliable mitigation recommendations based solely on limited observational data collected from the historical unhealthy events. To enhance the causal inference model, Deoxys further incorporates a policy fallback mechanism based on model uncertainty and action overriding mechanisms to (i) improve the reliability of the system, and (ii) strike a good tradeoff between downtime reduction and resource utilization, thereby enhancing the overall system performance. After deploying Deoxys in a large-scale cloud infrastructure at Microsoft, our observations demonstrate that Deoxys significantly reduces average VM downtime by 53% compared to a legacy policy, while leading to 49.5% lower VM interruption rate. This substantial improvement enhances the reliability and stability of cloud platforms, resulting in a seamless customer experience. △ Less

Submitted 23 October, 2024; originally announced October 2024.

arXiv:2410.15930 [pdf, other]

Centrality-aware Product Retrieval and Ranking

Authors: Hadeel Saadany, Swapnil Bhosale, Samarth Agrawal, Diptesh Kanojia, Constantin Orasan, Zhe Wu

Abstract: This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to users' search queries. Ambiguity and complexity of user queries often lead to a mismatch between the user's intent and retrieved product titles or documents. Recent approaches have proposed the use of Transformer-based models, which need millions of annotated query-title… ▽ More This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to users' search queries. Ambiguity and complexity of user queries often lead to a mismatch between the user's intent and retrieved product titles or documents. Recent approaches have proposed the use of Transformer-based models, which need millions of annotated query-title pairs during the pre-training stage, and this data often does not take user intent into account. To tackle this, we curate samples from existing datasets at eBay, manually annotated with buyer-centric relevance scores and centrality scores, which reflect how well the product title matches the users' intent. We introduce a User-intent Centrality Optimization (UCO) approach for existing models, which optimises for the user intent in semantic product search. To that end, we propose a dual-loss based optimisation to handle hard negatives, i.e., product titles that are semantically relevant but do not reflect the user's intent. Our contributions include curating challenging evaluation sets and implementing UCO, resulting in significant product ranking efficiency improvements observed for different evaluation metrics. Our work aims to ensure that the most buyer-centric titles for a query are ranked higher, thereby, enhancing the user experience on e-commerce platforms. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: EMNLP 2024: Industry track

arXiv:2410.11624 [pdf, other]

Findings of the WMT 2024 Shared Task on Chat Translation

Authors: Wafaa Mohammed, Sweta Agrawal, M. Amin Farajian, Vera Cabarrão, Bryan Eikema, Ana C. Farinha, José G. C. de Souza

Abstract: This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversation context in translation quality and evaluation. We also include two new language pairs: English-Korean and English-Dutch, in addition to the set of language… ▽ More This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversation context in translation quality and evaluation. We also include two new language pairs: English-Korean and English-Dutch, in addition to the set of language pairs from previous editions: English-German, English-French, and English-Brazilian Portuguese. We received 22 primary submissions and 32 contrastive submissions from eight teams, with each language pair having participation from at least three teams. We evaluated the systems comprehensively using both automatic metrics and human judgments via a direct assessment framework. The official rankings for each language pair were determined based on human evaluation scores, considering performance in both translation directions--agent and customer. Our analysis shows that while the systems excelled at translating individual turns, there is room for improvement in overall conversation-level translation quality. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 12 pages, 5 figures, 13 tables

arXiv:2410.10995 [pdf, ps, other]

Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation

Authors: Emmanouil Zaranis, Giuseppe Attanasio, Sweta Agrawal, André F. T. Martins

Abstract: Quality estimation (QE)-the automatic assessment of translation quality-has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. While QE metrics have been optimized to align with human judgments, whether they encode social biases has been largely overlooked. Biased QE risks favoring certain demographic groups over others, e.g., by… ▽ More Quality estimation (QE)-the automatic assessment of translation quality-has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. While QE metrics have been optimized to align with human judgments, whether they encode social biases has been largely overlooked. Biased QE risks favoring certain demographic groups over others, e.g., by exacerbating gaps in visibility and usability. This paper defines and investigates gender bias of QE metrics and discusses its downstream implications for machine translation (MT). Experiments with state-of-the-art QE metrics across multiple domains, datasets, and languages reveal significant bias. When a human entity's gender in the source is undisclosed, masculine-inflected translations score higher than feminine-inflected ones, and gender-neutral translations are penalized. Even when contextual cues disambiguate gender, using context-aware QE metrics leads to more errors in selecting the correct translation inflection for feminine referents than for masculine ones. Moreover, a biased QE metric affects data filtering and quality-aware decoding. Our findings underscore the need for a renewed focus on developing and evaluating QE metrics centered on gender. △ Less

Submitted 2 June, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

Comments: ACL 2025

arXiv:2410.07779 [pdf, other]

Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation

Authors: Sweta Agrawal, José G. C. de Souza, Ricardo Rei, António Farinhas, Gonçalo Faria, Patrick Fernandes, Nuno M Guerreiro, Andre Martins

Abstract: Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the othe… ▽ More Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the other hand, can induce preferences, but they might not match human expectations perfectly. In this paper, we propose an approach that leverages the best of both worlds. We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems and evaluate the ability of current automatic metrics to recover these preferences. We then use this analysis to curate a new dataset, MT-Pref (metric induced translation preference) dataset, which comprises 18k instances covering 18 language directions, using texts sourced from multiple domains post-2022. We show that aligning TOWER models on MT-Pref significantly improves translation quality on WMT23 and FLORES benchmarks. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: Accepted at EMNLP Main 2024

Showing 1–50 of 331 results for author: Agrawal, S