-
Scalable Learning of High-Dimensional Demonstrations with Composition of Linear Parameter Varying Dynamical Systems
Authors:
Shreenabh Agrawal,
Hugo T. M. Kussaba,
Lingyun Chen,
Allen Emmanuel Binny,
Abdalla Swikir,
Pushpak Jagtap,
Sami Haddadin
Abstract:
Learning from Demonstration (LfD) techniques enable robots to learn and generalize tasks from user demonstrations, eliminating the need for coding expertise among end-users. One established technique to implement LfD in robots is to encode demonstrations in a stable Dynamical System (DS). However, finding a stable dynamical system entails solving an optimization problem with bilinear matrix inequa…
▽ More
Learning from Demonstration (LfD) techniques enable robots to learn and generalize tasks from user demonstrations, eliminating the need for coding expertise among end-users. One established technique to implement LfD in robots is to encode demonstrations in a stable Dynamical System (DS). However, finding a stable dynamical system entails solving an optimization problem with bilinear matrix inequality (BMI) constraints, a non-convex problem which, depending on the number of scalar constraints and variables, demands significant computational resources and is susceptible to numerical issues such as floating-point errors. To address these challenges, we propose a novel compositional approach that enhances the applicability and scalability of learning stable DSs with BMIs.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
Authors:
Raghavv Goel,
Sudhanshu Agrawal,
Mukul Gagrani,
Junyoung Park,
Yifan Zao,
He Zhang,
Tian Liu,
Yiping Yang,
Xin Yuan,
Jiuyan Lu,
Chris Lott,
Mingu Lee
Abstract:
In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, f…
▽ More
In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, followed by verification by a base LLM, a target model, accepting a subset as its valid generation. As it is usually considered that the speculative decoding requires one-to-one mapping between vocabularies of the target model and the draft model, it has been natural to share the vocabulary between them, or even share the LM head as in EAGLE or Medusa. We first identify that this draft token sampling scheme inherently contains an unnecessary inference overhead in drafting, especially for some target LLMs with very large vocabularies. Then, we propose a simple technique, VocabTrim, to mitigate the drafting overhead to improve the generation speed in memory-bound environment. VocabTrim reconstructs the drafter LM head to contain only a limited set of tokens, selected by the most frequently sampled from the vocabulary of the target model. While limiting the vocabulary in drafting slightly degrades the acceptance rate, it significantly reduces the drafting latency in memory-bound process which is often the case on edge devices, resulting in higher memory-bound speed up (MBSU). We show that our method can boost the memory-bound speed-up for Llama-3 models on Spec-Bench, specifically by 16% for Llama-3.2-3B-Instruct.
△ Less
Submitted 3 July, 2025; v1 submitted 27 June, 2025;
originally announced June 2025.
-
NEAR$^2$: A Nested Embedding Approach to Efficient Product Retrieval and Ranking
Authors:
Shenbin Qian,
Diptesh Kanojia,
Samarth Agrawal,
Hadeel Saadany,
Swapnil Bhosale,
Constantin Orasan,
Zhe Wu
Abstract:
E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and maintain efficient processing of vast product catalogs. The dual challenge lies in precisely matching user intent with relevant products while managing the computational demands of real-time search across massive inventories. In this paper, we propose a Nested Emb…
▽ More
E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and maintain efficient processing of vast product catalogs. The dual challenge lies in precisely matching user intent with relevant products while managing the computational demands of real-time search across massive inventories. In this paper, we propose a Nested Embedding Approach to product Retrieval and Ranking, called NEAR$^2$, which can achieve up to $12$ times efficiency in embedding size at inference time while introducing no extra cost in training and improving performance in accuracy for various encoder-based Transformer models. We validate our approach using different loss functions for the retrieval and ranking task, including multiple negative ranking loss and online contrastive loss, on four different test sets with various IR challenges such as short and implicit queries. Our approach achieves an improved performance over a smaller embedding dimension, compared to any existing models.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Null infinity as an inverted extremal horizon: Matching an infinite set of conserved quantities for gravitational perturbations
Authors:
Shreyansh Agrawal,
Panagiotis Charalambous,
Laura Donnay
Abstract:
Every spacetime that is asymptotically flat near null infinity can be conformally mapped via a spatial inversion onto the geometry around an extremal, non-rotating and non-expanding horizon. We set up a dictionary for this geometric duality, connecting the geometry and physics near null infinity to those near the dual horizon. We then study its physical implications for conserved quantities for ex…
▽ More
Every spacetime that is asymptotically flat near null infinity can be conformally mapped via a spatial inversion onto the geometry around an extremal, non-rotating and non-expanding horizon. We set up a dictionary for this geometric duality, connecting the geometry and physics near null infinity to those near the dual horizon. We then study its physical implications for conserved quantities for extremal black holes, extending previously known results to the case of gravitational perturbations. In particular, we derive a tower of near-horizon gravitational charges that are exactly conserved and show their one-to-one matching with Newman-Penrose conserved quantities associated with gravitational perturbations of the extremal Reissner-Nordström black hole geometry. We furthermore demonstrate the physical relevance of spatial inversions for extremal Kerr-Newman black holes, even if the latter are notoriously not conformally isometric under such inversions.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems
Authors:
Pengfei He,
Zhenwei Dai,
Xianfeng Tang,
Yue Xing,
Hui Liu,
Jingying Zeng,
Qiankun Peng,
Shrivats Agrawal,
Samarth Varshney,
Suhang Wang,
Jiliang Tang,
Qi He
Abstract:
Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a singl…
▽ More
Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a single type of harmfulness rather than analyze it in a holistic approach from multiple trustworthiness perspectives. In this work, we propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness. Inspired by human communication literature[1], through systematically analyzing attention behaviors across six orthogonal trust dimensions, we find that certain attention heads in the LLM specialize in detecting specific types of violations. Leveraging these insights, A-Trust directly infers trustworthiness from internal attention patterns without requiring external prompts or verifiers. Building upon A-Trust, we develop a principled and efficient trust management system (TMS) for LLM-MAS, enabling both message-level and agent-level trust assessment. Experiments across diverse multi-agent settings and tasks demonstrate that applying our TMS significantly enhances robustness against malicious inputs.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation
Authors:
Chengyang Peng,
Zhihao Zhang,
Shiting Gong,
Sankalp Agrawal,
Keith A. Redmill,
Ayonga Hereid
Abstract:
Safe and real-time navigation is fundamental for humanoid robot applications. However, existing bipedal robot navigation frameworks often struggle to balance computational efficiency with the precision required for stable locomotion. We propose a novel hierarchical framework that continuously generates dynamic subgoals to guide the robot through cluttered environments. Our method comprises a high-…
▽ More
Safe and real-time navigation is fundamental for humanoid robot applications. However, existing bipedal robot navigation frameworks often struggle to balance computational efficiency with the precision required for stable locomotion. We propose a novel hierarchical framework that continuously generates dynamic subgoals to guide the robot through cluttered environments. Our method comprises a high-level reinforcement learning (RL) planner for subgoal selection in a robot-centric coordinate system and a low-level Model Predictive Control (MPC) based planner which produces robust walking gaits to reach these subgoals. To expedite and stabilize the training process, we incorporate a data bootstrapping technique that leverages a model-based navigation approach to generate a diverse, informative dataset. We validate our method in simulation using the Agility Robotics Digit humanoid across multiple scenarios with random obstacles. Results show that our framework significantly improves navigation success rates and adaptability compared to both the original model-based method and other learning-based methods.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
Authors:
Saurabh Agrawal,
Raj Gohil,
Gopal Kumar Agrawal,
Vikram C M,
Kushal Verma
Abstract:
Speech quality assessment is a critical process in selecting text-to-speech synthesis (TTS) or voice conversion models. Evaluation of voice synthesis can be done using objective metrics or subjective metrics. Although there are many objective metrics like the Perceptual Evaluation of Speech Quality (PESQ), Perceptual Objective Listening Quality Assessment (POLQA) or Short-Time Objective Intelligib…
▽ More
Speech quality assessment is a critical process in selecting text-to-speech synthesis (TTS) or voice conversion models. Evaluation of voice synthesis can be done using objective metrics or subjective metrics. Although there are many objective metrics like the Perceptual Evaluation of Speech Quality (PESQ), Perceptual Objective Listening Quality Assessment (POLQA) or Short-Time Objective Intelligibility (STOI) but none of them is feasible in selecting the best model. On the other hand subjective metric like Mean Opinion Score is highly reliable but it requires a lot of manual efforts and are time-consuming. To counter the issues in MOS Evaluation, we have developed a novel model, Speaker Agnostic Latent Features (SALF)-Mean Opinion Score (MOS) which is a small-sized, end-to-end, highly generalized and scalable model for predicting MOS score on a scale of 5. We use the sequences of convolutions and stack them to get the latent features of the audio samples to get the best state-of-the-art results based on mean squared error (MSE), Linear Concordance Correlation coefficient (LCC), Spearman Rank Correlation Coefficient (SRCC) and Kendall Rank Correlation Coefficient (KTAU).
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Q-learning with Posterior Sampling
Authors:
Priyank Agrawal,
Shipra Agrawal,
Azmat Azati
Abstract:
Bayesian posterior sampling techniques have demonstrated superior empirical performance in many exploration-exploitation settings. However, their theoretical analysis remains a challenge, especially in complex settings like reinforcement learning. In this paper, we introduce Q-Learning with Posterior Sampling (PSQL), a simple Q-learning-based algorithm that uses Gaussian posteriors on Q-values for…
▽ More
Bayesian posterior sampling techniques have demonstrated superior empirical performance in many exploration-exploitation settings. However, their theoretical analysis remains a challenge, especially in complex settings like reinforcement learning. In this paper, we introduce Q-Learning with Posterior Sampling (PSQL), a simple Q-learning-based algorithm that uses Gaussian posteriors on Q-values for exploration, akin to the popular Thompson Sampling algorithm in the multi-armed bandit setting. We show that in the tabular episodic MDP setting, PSQL achieves a regret bound of $\tilde O(H^2\sqrt{SAT})$, closely matching the known lower bound of $Ω(H\sqrt{SAT})$. Here, S, A denote the number of states and actions in the underlying Markov Decision Process (MDP), and $T=KH$ with $K$ being the number of episodes and $H$ being the planning horizon. Our work provides several new technical insights into the core challenges in combining posterior sampling with dynamic programming and TD-learning-based RL algorithms, along with novel ideas for resolving those difficulties. We hope this will form a starting point for analyzing this efficient and important algorithmic technique in even more complex RL settings.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Entanglement Negativity of Spin-Orbit Correlations in a general Qubit-Qudit Setup
Authors:
Sanskriti Agrawal,
Raktim Abir
Abstract:
We present the complete eigenvalue spectrum of the partially transposed density matrix for a pure bipartite quantum state acting on a generic $2 \otimes n$ Hilbert space. The spectrum contains four non-zero eigenvalues, as, \begin{eqnarray} λ_{1,2}=\pm \sqrt{A}, ~~~ λ_{3,4}= \frac{1}{2}(1\pm\sqrt{1-4 A}), \nonumber \end{eqnarray} where $A$ is the determinant of the reduced density matrix (traced o…
▽ More
We present the complete eigenvalue spectrum of the partially transposed density matrix for a pure bipartite quantum state acting on a generic $2 \otimes n$ Hilbert space. The spectrum contains four non-zero eigenvalues, as, \begin{eqnarray} λ_{1,2}=\pm \sqrt{A}, ~~~ λ_{3,4}= \frac{1}{2}(1\pm\sqrt{1-4 A}), \nonumber \end{eqnarray} where $A$ is the determinant of the reduced density matrix (traced over the larger subspace). As $0 \leqslant A \leqslant1/4$, only one is negative among the four non-trivial eigenvalues. Within this qubit-qudit framework, we further studied the negativity as a measure of entanglement for the case of spin-orbit correlation of partons inside a proton. The entanglement negativity for spin-orbit correlations is found to be related to the gluon helicity PDF and the Hermitian angle of the associated Hilbert space for linearly polarized protons.
△ Less
Submitted 10 June, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
From Structural Design to Dynamics Modeling: Control-Oriented Development of a 3-RRR Parallel Ankle Rehabilitation Robot
Authors:
Siyuan Zhang,
Yufei Zhang,
Junlin Lyu,
Sunil K. Agrawal
Abstract:
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structur…
▽ More
This paper presents the development of a wearable ankle rehabilitation robot based on a 3-RRR spherical parallel mechanism (SPM) to support multi-DOF recovery through pitch, roll, and yaw motions. The system features a compact, ergonomic structure designed for comfort, safety, and compatibility with ankle biomechanics. A complete design-to-dynamics pipeline has been implemented, including structural design, kinematic modeling for motion planning, and Lagrangian-based dynamic modeling for torque estimation and simulation analysis. Preliminary simulations verify stable joint coordination and smooth motion tracking under representative rehabilitation trajectories. The control framework is currently being developed to enhance responsiveness across the workspace. Future work will focus on integrating personalized modeling and adaptive strategies to address kinematic singularities through model based control. This work establishes a foundational platform for intelligent, personalized ankle rehabilitation, enabling both static training and potential extension to gait-phase-timed assistance.
△ Less
Submitted 30 May, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Authors:
Ren-Wei Liang,
Chin-Ting Hsu,
Chan-Hung Yu,
Saransh Agrawal,
Shih-Cheng Huang,
Shang-Tse Chen,
Kuan-Hao Huang,
Shao-Hua Sun
Abstract:
Ensuring that large language models (LLMs) are both helpful and harmless is a critical challenge, as overly strict constraints can lead to excessive refusals, while permissive models risk generating harmful content. Existing approaches, such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO), attempt to balance these trade-offs but suffer from performance…
▽ More
Ensuring that large language models (LLMs) are both helpful and harmless is a critical challenge, as overly strict constraints can lead to excessive refusals, while permissive models risk generating harmful content. Existing approaches, such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO), attempt to balance these trade-offs but suffer from performance conflicts, limited controllability, and poor extendability. To address these issues, we propose Preference Vector, a novel framework inspired by task arithmetic. Instead of optimizing multiple preferences within a single objective, we train separate models on individual preferences, extract behavior shifts as preference vectors, and dynamically merge them at test time. This modular approach enables fine-grained, user-controllable preference adjustments and facilitates seamless integration of new preferences without retraining. Experiments show that our proposed Preference Vector framework improves helpfulness without excessive conservatism, allows smooth control over preference trade-offs, and supports scalable multi-preference alignment.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper Bounds
Authors:
Shubhada Agrawal,
Aaditya Ramdas
Abstract:
We prove two lower bounds for stopping times of sequential tests between general composite nulls and alternatives. The first lower bound is for the setting where the type-1 error level $α$ approaches zero, and equals $\log(1/α)$ divided by a certain infimum KL divergence, termed $\operatorname{KL_{inf}}$. The second lower bound applies to the setting where $α$ is fixed and…
▽ More
We prove two lower bounds for stopping times of sequential tests between general composite nulls and alternatives. The first lower bound is for the setting where the type-1 error level $α$ approaches zero, and equals $\log(1/α)$ divided by a certain infimum KL divergence, termed $\operatorname{KL_{inf}}$. The second lower bound applies to the setting where $α$ is fixed and $\operatorname{KL_{inf}}$ approaches 0 (meaning that the null and alternative sets are not separated) and equals $c \operatorname{KL_{inf}}^{-1} \log \log \operatorname{KL_{inf}}^{-1}$ for a universal constant $c > 0$. We also provide a sufficient condition for matching the upper bounds and show that this condition is met in several special cases. Given past work, these upper and lower bounds are unsurprising in their form; our main contribution is the generality in which they hold, for example, not requiring reference measures or compactness of the classes.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation
Authors:
Saransh Agrawal,
Kuan-Hao Huang
Abstract:
Large language models (LLMs) frequently memorize sensitive information during training, posing risks when deploying publicly accessible models. Current machine unlearning methods struggle to selectively remove specific data associations without degrading overall model capabilities. This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which introduces a two-stage methodol…
▽ More
Large language models (LLMs) frequently memorize sensitive information during training, posing risks when deploying publicly accessible models. Current machine unlearning methods struggle to selectively remove specific data associations without degrading overall model capabilities. This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which introduces a two-stage methodology that combines causal mediation analysis with layer-specific optimization. Through systematic causal tracing experiments on OLMo architectures (1B and 7B parameters), we identify the critical role of the first few transformer layers (layers 0-5) in storing subject-attribute associations within MLP modules. Building on this insight, we develop a constrained optimization approach that freezes upper layers while applying a novel joint loss function to lower layers-simultaneously maximizing forget set loss via output token cross-entropy penalties and minimizing retain set deviation through adaptive regularization. Our method achieves 2nd place in the 1B model track, demonstrating strong task performance while maintaining 88% of baseline MMLU accuracy. These results establish causal-informed layer optimization as a promising paradigm for efficient, precise unlearning in LLMs, offering a significant step forward in addressing data privacy concerns in AI systems.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Multilingual Contextualization of Large Language Models for Document-Level Machine Translation
Authors:
Miguel Moura Ramos,
Patrick Fernandes,
Sweta Agrawal,
André F. T. Martins
Abstract:
Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena across sentences and paragraphs. In this work, we propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality…
▽ More
Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena across sentences and paragraphs. In this work, we propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality document-level data, which we curate and introduce as DocBlocks. Our approach supports multiple translation paradigms, including direct document-to-document and chunk-level translation, by integrating instructions both with and without surrounding context. This enables models to better capture cross-sentence dependencies while maintaining strong sentence-level translation performance. Experimental results show that incorporating multiple translation paradigms improves document-level translation quality and inference speed compared to prompting and agent-based methods.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation
Authors:
Julia Kreutzer,
Eleftheria Briakou,
Sweta Agrawal,
Marzieh Fadaee,
Kocmi Tom
Abstract:
Generation capabilities and language coverage of multilingual large language models (mLLMs) are advancing rapidly. However, evaluation practices for generative abilities of mLLMs are still lacking comprehensiveness, scientific rigor, and consistent adoption across research labs, which undermines their potential to meaningfully guide mLLM development. We draw parallels with machine translation (MT)…
▽ More
Generation capabilities and language coverage of multilingual large language models (mLLMs) are advancing rapidly. However, evaluation practices for generative abilities of mLLMs are still lacking comprehensiveness, scientific rigor, and consistent adoption across research labs, which undermines their potential to meaningfully guide mLLM development. We draw parallels with machine translation (MT) evaluation, a field that faced similar challenges and has, over decades, developed transparent reporting standards and reliable evaluations for multilingual generative models. Through targeted experiments across key stages of the generative evaluation pipeline, we demonstrate how best practices from MT evaluation can deepen the understanding of quality differences between models. Additionally, we identify essential components for robust meta-evaluation of mLLMs, ensuring the evaluation methods themselves are rigorously assessed. We distill these insights into a checklist of actionable recommendations for mLLM research and development.
△ Less
Submitted 17 April, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Neural Control Barrier Functions from Physics Informed Neural Networks
Authors:
Shreenabh Agrawal,
Manan Tayal,
Aditya Singh,
Shishir Kolathaya
Abstract:
As autonomous systems become increasingly prevalent in daily life, ensuring their safety is paramount. Control Barrier Functions (CBFs) have emerged as an effective tool for guaranteeing safety; however, manually designing them for specific applications remains a significant challenge. With the advent of deep learning techniques, recent research has explored synthesizing CBFs using neural networks…
▽ More
As autonomous systems become increasingly prevalent in daily life, ensuring their safety is paramount. Control Barrier Functions (CBFs) have emerged as an effective tool for guaranteeing safety; however, manually designing them for specific applications remains a significant challenge. With the advent of deep learning techniques, recent research has explored synthesizing CBFs using neural networks-commonly referred to as neural CBFs. This paper introduces a novel class of neural CBFs that leverages a physics-inspired neural network framework by incorporating Zubov's Partial Differential Equation (PDE) within the context of safety. This approach provides a scalable methodology for synthesizing neural CBFs applicable to high-dimensional systems. Furthermore, by utilizing reciprocal CBFs instead of zeroing CBFs, the proposed framework allows for the specification of flexible, user-defined safe regions. To validate the effectiveness of the approach, we present case studies on three different systems: an inverted pendulum, autonomous ground navigation, and aerial navigation in obstacle-laden environments.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Soft theorems and spontaneous symmetry breaking
Authors:
Shreyansh Agrawal,
Kevin Nguyen
Abstract:
The soft photon and soft graviton theorems of Weinberg are known to derive from conservation laws associated with asymptotic symmetries. Within the corresponding classical theories, one often speaks of spontaneous symmetry breaking and vacuum degeneracy, but a genuine quantum description of this phenomenon has largely been lacking. Here we establish spontaneous breaking of asymptotic symmetries an…
▽ More
The soft photon and soft graviton theorems of Weinberg are known to derive from conservation laws associated with asymptotic symmetries. Within the corresponding classical theories, one often speaks of spontaneous symmetry breaking and vacuum degeneracy, but a genuine quantum description of this phenomenon has largely been lacking. Here we establish spontaneous breaking of asymptotic symmetries and the existence of Goldstone `particles' using exclusively the language of quantum field theory. This is made possible through the reformulation of massless scattering theory in terms of carrollian conformal field theory, and the observation that soft theorems correspond to Ward identities of broken symmetries. A suitable version of Goldstone theorem shows that there must exist zero-momentum particles described by conformal fields on the celestial sphere, in agreement with the common lore. More specifically, these belong to unitary representations in the discrete series of the Lorentz group, and are therefore naturally equipped with logarithmic two-point functions. We discuss the relevance of these observations to the problem of infrared divergences that scattering amplitudes suffer from.
△ Less
Submitted 26 May, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering
Authors:
Patrick Fernandes,
Sweta Agrawal,
Emmanouil Zaranis,
André F. T. Martins,
Graham Neubig
Abstract:
Despite the steady progress in machine translation evaluation, existing automatic metrics struggle to capture how well meaning is preserved beyond sentence boundaries. We posit that reliance on a single intrinsic quality score, trained to mimic human judgments, might be insufficient for evaluating translations of long, complex passages, and a more ``pragmatic'' approach that assesses how accuratel…
▽ More
Despite the steady progress in machine translation evaluation, existing automatic metrics struggle to capture how well meaning is preserved beyond sentence boundaries. We posit that reliance on a single intrinsic quality score, trained to mimic human judgments, might be insufficient for evaluating translations of long, complex passages, and a more ``pragmatic'' approach that assesses how accurately key information is conveyed by a translation in context is needed. We introduce TREQA (Translation Evaluation via Question-Answering), a framework that extrinsically evaluates translation quality by assessing how accurately candidate translations answer reading comprehension questions that target key information in the original source or reference texts. In challenging domains that require long-range understanding, such as literary texts, we show that TREQA is competitive with and, in some cases, outperforms state-of-the-art neural and LLM-based metrics in ranking alternative paragraph-level translations, despite never being explicitly optimized to correlate with human judgments. Furthermore, the generated questions and answers offer interpretability: empirical analysis shows that they effectively target translation errors identified by experts in evaluated datasets. Our code is available at https://github.com/deep-spin/treqa
△ Less
Submitted 11 April, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
Lattice Based Crypto breaks in a Superposition of Spacetimes
Authors:
Divesh Aggarwal,
Shashwat Agrawal,
Rajendra Kumar
Abstract:
We explore the computational implications of a superposition of spacetimes, a phenomenon hypothesized in quantum gravity theories. This was initiated by Shmueli (2024) where the author introduced the complexity class $\mathbf{BQP^{OI}}$ consisting of promise problems decidable by quantum polynomial time algorithms with access to an oracle for computing order interference. In this work, it was show…
▽ More
We explore the computational implications of a superposition of spacetimes, a phenomenon hypothesized in quantum gravity theories. This was initiated by Shmueli (2024) where the author introduced the complexity class $\mathbf{BQP^{OI}}$ consisting of promise problems decidable by quantum polynomial time algorithms with access to an oracle for computing order interference. In this work, it was shown that the Graph Isomorphism problem and the Gap Closest Vector Problem (with approximation factor $\mathcal{O}(n^{3/2})$) are in $\mathbf{BQP^{OI}}$. We extend this result by showing that the entire complexity class $\mathbf{SZK}$ (Statistical Zero Knowledge) is contained within $\mathbf{BQP^{OI}}$. This immediately implies that the security of numerous lattice based cryptography schemes will be compromised in a computational model based on superposition of spacetimes, since these often rely on the hardness of the Learning with Errors problem, which is in $\mathbf{SZK}$.
△ Less
Submitted 1 April, 2025; v1 submitted 27 March, 2025;
originally announced March 2025.
-
Pedestrians and Robots: A Novel Dataset for Learning Distinct Social Navigation Forces
Authors:
Subham Agrawal,
Nico Ostermann-Myrau,
Nils Dengler,
Maren Bennewitz
Abstract:
The increasing use of robots in human-centric public spaces such as shopping malls, sidewalks, and hospitals, requires understanding of how pedestrians respond to their presence. However, existing research lacks comprehensive datasets that capture the full range of pedestrian behaviors, e.g., including avoidance, neutrality, and attraction in the presence of robots. Such datasets can be used to ef…
▽ More
The increasing use of robots in human-centric public spaces such as shopping malls, sidewalks, and hospitals, requires understanding of how pedestrians respond to their presence. However, existing research lacks comprehensive datasets that capture the full range of pedestrian behaviors, e.g., including avoidance, neutrality, and attraction in the presence of robots. Such datasets can be used to effectively learn models capable of accurately predicting diverse responses of pedestrians to robot presence, which are crucial for advancing robot navigation strategies and optimizing pedestrian-aware motion planning. In this paper, we address these challenges by collecting a novel dataset of pedestrian motion in two outdoor locations under three distinct conditions, i.e., no robot presence, a stationary robot, and a moving robot. Thus, unlike existing datasets, ours explicitly encapsulates variations in pedestrian behavior across the different robot conditions. Using our dataset, we propose a novel Neural Social Robot Force Model (NSRFM), an extension of the traditional Social Force Model that integrates neural networks and robot-induced forces to better predict pedestrian behavior in the presence of robots. We validate the NSRFM by comparing its generated trajectories on different real-world datasets. Furthermore, we implemented it in simulation to enable the learning and benchmarking of robot navigation strategies based on their impact on pedestrian movement. Our results demonstrate the model's effectiveness in replicating real-world pedestrian reactions and its its utility in developing, evaluating, and benchmarking social robot navigation algorithms.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Learning-based Estimation of Forward Kinematics for an Orthotic Parallel Robotic Mechanism
Authors:
Jingzong Zhou,
Yuhan Zhu,
Xiaobin Zhang,
Sunil Agrawal,
Konstantinos Karydis
Abstract:
This paper introduces a 3D parallel robot with three identical five-degree-of-freedom chains connected to a circular brace end-effector, aimed to serve as an assistive device for patients with cervical spondylosis. The inverse kinematics of the system is solved analytically, whereas learning-based methods are deployed to solve the forward kinematics. The methods considered herein include a Koopman…
▽ More
This paper introduces a 3D parallel robot with three identical five-degree-of-freedom chains connected to a circular brace end-effector, aimed to serve as an assistive device for patients with cervical spondylosis. The inverse kinematics of the system is solved analytically, whereas learning-based methods are deployed to solve the forward kinematics. The methods considered herein include a Koopman operator-based approach as well as a neural network-based approach. The task is to predict the position and orientation of end-effector trajectories. The dataset used to train these methods is based on the analytical solutions derived via inverse kinematics. The methods are tested both in simulation and via physical hardware experiments with the developed robot. Results validate the suitability of deploying learning-based methods for studying parallel mechanism forward kinematics that are generally hard to resolve analytically.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Sustaining Human Agency, Attending to Its Cost: An Investigation into Generative AI Design for Non-Native Speakers' Language Use
Authors:
Yimin Xiao,
Cartor Hancock,
Sweta Agrawal,
Nikita Mehandru,
Niloufar Salehi,
Marine Carpuat,
Ge Gao
Abstract:
AI systems and tools today can generate human-like expressions on behalf of people. It raises the crucial question about how to sustain human agency in AI-mediated communication. We investigated this question in the context of machine translation (MT) assisted conversations. Our participants included 45 dyads. Each dyad consisted of one new immigrant in the United States, who leveraged MT for Engl…
▽ More
AI systems and tools today can generate human-like expressions on behalf of people. It raises the crucial question about how to sustain human agency in AI-mediated communication. We investigated this question in the context of machine translation (MT) assisted conversations. Our participants included 45 dyads. Each dyad consisted of one new immigrant in the United States, who leveraged MT for English information seeking as a non-native speaker, and one local native speaker, who acted as the information provider. Non-native speakers could influence the English production of their message in one of three ways: labeling the quality of MT outputs, regular post-editing without additional hints, or augmented post-editing with LLM-generated hints. Our data revealed a greater exercise of non-native speakers' agency under the two post-editing conditions. This benefit, however, came at a significant cost to the dyadic-level communication performance. We derived insights for MT and other generative AI design from our findings.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications
Authors:
Vishakha Agrawal,
Archie Chaudhury,
Shreya Agrawal
Abstract:
While Large Language Models (LLMs) are fundamentally next-token prediction systems, their practical applications extend far beyond this basic function. From natural language processing and text generation to conversational assistants and software use, LLMs have numerous use-cases, and have already acquired a significant degree of enterprise adoption. To evaluate such models, static evaluation data…
▽ More
While Large Language Models (LLMs) are fundamentally next-token prediction systems, their practical applications extend far beyond this basic function. From natural language processing and text generation to conversational assistants and software use, LLMs have numerous use-cases, and have already acquired a significant degree of enterprise adoption. To evaluate such models, static evaluation datasets, consisting of a set of prompts and their corresponding ground truths, are often used to benchmark the efficacy of the model for a particular task. In this paper, we provide the basis for a more comprehensive evaluation framework, based upon a traditional game and tool-based architecture that enables a more overarching measurement of a model's capabilities. For simplicity, we provide a generalized foundation that can be extended, without significant alteration, to numerous scenarios, from specific use cases such as supply chain management or financial reasoning, to abstract measurements such as ethics or safety.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Aerial Infrared Health Monitoring of Solar Photovoltaic Farms at Scale
Authors:
Isaac Corley,
Conor Wallace,
Sourav Agrawal,
Burton Putrah,
Jonathan Lwowski
Abstract:
Solar photovoltaic (PV) farms represent a major source of global renewable energy generation, yet their true operational efficiency often remains unknown at scale. In this paper, we present a comprehensive, data-driven framework for large-scale airborne infrared inspection of North American solar installations. Leveraging high-resolution thermal imagery, we construct and curate a geographically di…
▽ More
Solar photovoltaic (PV) farms represent a major source of global renewable energy generation, yet their true operational efficiency often remains unknown at scale. In this paper, we present a comprehensive, data-driven framework for large-scale airborne infrared inspection of North American solar installations. Leveraging high-resolution thermal imagery, we construct and curate a geographically diverse dataset encompassing thousands of PV sites, enabling machine learning-based detection and localization of defects that are not detectable in the visible spectrum. Our pipeline integrates advanced image processing, georeferencing, and airborne thermal infrared anomaly detection to provide rigorous estimates of performance losses. We highlight practical considerations in aerial data collection, annotation methodologies, and model deployment across a wide range of environmental and operational conditions. Our work delivers new insights into the reliability of large-scale solar assets and serves as a foundation for ongoing research on performance trends, predictive maintenance, and scalable analytics in the renewable energy sector.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Water dissociation and rotational broadening in the atmosphere of KELT-20 b from high-resolution spectroscopy
Authors:
Luke Finnerty,
Yinzi Xin,
Jerry W. Xuan,
Julie Inglis,
Michael P. Fitzgerald,
Shubh Agrawal,
Ashley Baker,
Randall Bartos,
Geoffrey A. Blake,
Benjamin Calvin,
Sylvain Cetre,
Jacques-Robert Delorme,
Greg Doppmann,
Daniel Echeverri,
Katelyn Horstman,
Chih-Chun Hsu,
Nemanja Jovanovic,
Joshua Liberman,
Ronald A. López,
Emily C. Martin,
Dimitri Mawet,
Evan Morris,
Jacklyn Pezzato,
Jean-Baptiste Ruffio,
Ben Sappey
, et al. (7 additional authors not shown)
Abstract:
We present atmospheric retrievals from Keck/KPIC phase II observations of the ultra-hot Jupiter KELT-20/MASCARA-2~b. Previous free retrievals of molecular abundances for ultra-hot Jupiters have been impacted by significant model biases due to variations in vertical abundance profiles, which we address by including molecular dissociation into our retrieval framework as an additional free parameter.…
▽ More
We present atmospheric retrievals from Keck/KPIC phase II observations of the ultra-hot Jupiter KELT-20/MASCARA-2~b. Previous free retrievals of molecular abundances for ultra-hot Jupiters have been impacted by significant model biases due to variations in vertical abundance profiles, which we address by including molecular dissociation into our retrieval framework as an additional free parameter. We measure the abundance of CO ($\rm \log CO_{MMR} = -2.5^{+0.6}_{-0.5}$) and obtain a lower limit on the abundance of H$_2$O ($\rm \log H{_2}O_{MMR} = -1.5^{+0.8}_{-1.0}$, $>-3.0$ at 95\% confidence) in the atmosphere of \keltb. These abundances yield an atmospheric $\rm C/O = 0.1^{+0.4}_{-0.1}$ ($\rm C/O < 0.9$ at 95\% confidence) and suggest a metallicity approximately solar to $10\times$ solar. H$_2$O is dissociated at pressures below $\log P_{\rm H_2O} = -1.2^{+0.5}_{-0.7}$ bar, roughly consistent with predictions from chemical equilibrium models, and suggesting that the retrieved composition is not a result of assumptions about the vertical mixing profiles. We also constrain the rotational velocity of \keltb\ to $v\sin i = 7.5\pm0.7$ \kms, suggesting the presence of a jet comparable to the sound speed in the direction of the planet's rotation, assuming the actual rotation of the planet is tidally locked.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models
Authors:
Susmit Agrawal,
Deepika Vemuri,
Sri Siddarth Chakaravarthy P,
Vineeth N. Balasubramanian
Abstract:
Concept-based methods have emerged as a promising direction to develop interpretable neural networks in standard supervised settings. However, most works that study them in incremental settings assume either a static concept set across all experiences or assume that each experience relies on a distinct set of concepts. In this work, we study concept-based models in a more realistic, dynamic settin…
▽ More
Concept-based methods have emerged as a promising direction to develop interpretable neural networks in standard supervised settings. However, most works that study them in incremental settings assume either a static concept set across all experiences or assume that each experience relies on a distinct set of concepts. In this work, we study concept-based models in a more realistic, dynamic setting where new classes may rely on older concepts in addition to introducing new concepts themselves. We show that concepts and classes form a complex web of relationships, which is susceptible to degradation and needs to be preserved and augmented across experiences. We introduce new metrics to show that existing concept-based models cannot preserve these relationships even when trained using methods to prevent catastrophic forgetting, since they cannot handle forgetting at concept, class, and concept-class relationship levels simultaneously. To address these issues, we propose a novel method - MuCIL - that uses multimodal concepts to perform classification without increasing the number of trainable parameters across experiences. The multimodal concepts are aligned to concepts provided in natural language, making them interpretable by design. Through extensive experimentation, we show that our approach obtains state-of-the-art classification performance compared to other concept-based models, achieving over 2$\times$ the classification performance in some cases. We also study the ability of our model to perform interventions on concepts, and show that it can localize visual concepts in input images, providing post-hoc interpretations.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Defects in the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$ MOS system and the effect of thermal treatments
Authors:
Khushabu. S. Agrawal,
Paolo LaTorraca,
Jonas Valentijn,
Roberta Hawkins,
Adam A. Gruszecki,
Joy Roy,
Vasily Lebedev,
Lewys Jones,
Robert M. Wallace,
Chadwin D. Young,
Paul K. Hurley,
Karim Cherkaoui
Abstract:
We have investigated the properties of the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$/Cr/Au MOS (metal-oxide-semiconductor) system after annealing (450$^\circ$C) in different ambient conditions (forming gas, N$_2$ and O$_2$). Defect properties have been analyzed using an approach combining experimental impedance measurements with physics-based simulations of the capacitance-voltage (C-V) and conductance-v…
▽ More
We have investigated the properties of the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$/Cr/Au MOS (metal-oxide-semiconductor) system after annealing (450$^\circ$C) in different ambient conditions (forming gas, N$_2$ and O$_2$). Defect properties have been analyzed using an approach combining experimental impedance measurements with physics-based simulations of the capacitance-voltage (C-V) and conductance-voltage (G-V) characteristics of $β$-Ga$_2$O$_3$/HfO$_2$ MOS capacitors. This approach enabled us to detect two defect bands in HfO$_2$ characterized by thermal ionization energies of ~1.1eV (acceptor-like) and ~2eV (donor-like) attributed to a polaronic self-trapping state and an oxygen vacancy in HfO$_2$, respectively. This study demonstrates how thermal treatments affect the energy distributions and densities of the observed defects. The adopted methodology also enabled the extraction of the spatial distribution of defects across the HfO$_2$ thickness and Cr/HfO$_2$ interface. The high concentration of oxygen vacancies close to the Cr/HfO$_2$ interface extracted from experimental and simulated electrical data is confirmed by in-situ XPS analysis which shows how Cr is scavenging oxygen from the HfO$_2$ and creating the donor band confined near the Cr/HfO$_2$ interface. This donor band density is observed to be reduced after annealing as per simulation and unchanged for different annealing conditions. We speculate this may be due to the formation of dense films and polyforms of HfO$_2$ under different ambient as revealed by high-resolution TEM images.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
Authors:
António Farinhas,
Nuno M. Guerreiro,
Sweta Agrawal,
Ricardo Rei,
André F. T. Martins
Abstract:
Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimat…
▽ More
Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction (30% to 50%) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Time Series Treatment Effects Analysis with Always-Missing Controls
Authors:
Juan Shu,
Qiyu Han,
George Chen,
Xihao Cao,
Kangming Luo,
Dan Pallotta,
Shivam Agrawal,
Yuping Lu,
Xiaoyu Zhang,
Jawad Mansoor,
Jyoti Anand
Abstract:
Estimating treatment effects in time series data presents a significant challenge, especially when the control group is always unobservable. For example, in analyzing the effects of Christmas on retail sales, we lack direct observation of what would have occurred in late December without the Christmas impact. To address this, we try to recover the control group in the event period while accounting…
▽ More
Estimating treatment effects in time series data presents a significant challenge, especially when the control group is always unobservable. For example, in analyzing the effects of Christmas on retail sales, we lack direct observation of what would have occurred in late December without the Christmas impact. To address this, we try to recover the control group in the event period while accounting for confounders and temporal dependencies. Experimental results on the M5 Walmart retail sales data demonstrate robust estimation of the potential outcome of the control group as well as accurate predicted holiday effect. Furthermore, we provided theoretical guarantees for the estimated treatment effect, proving its consistency and asymptotic normality. The proposed methodology is applicable not only to this always-missing control scenario but also in other conventional time series causal inference settings.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Fenchel-Young Variational Learning
Authors:
Sophia Sklaviadis,
Sweta Agrawal,
Antonio Farinhas,
Andre Martins,
Mario Figueiredo
Abstract:
From a variational perspective, many statistical learning criteria involve seeking a distribution that balances empirical risk and regularization. In this paper, we broaden this perspective by introducing a new general class of variational methods based on Fenchel-Young (FY) losses, treated as divergences that generalize (and encompass) the familiar Kullback-Leibler divergence at the core of class…
▽ More
From a variational perspective, many statistical learning criteria involve seeking a distribution that balances empirical risk and regularization. In this paper, we broaden this perspective by introducing a new general class of variational methods based on Fenchel-Young (FY) losses, treated as divergences that generalize (and encompass) the familiar Kullback-Leibler divergence at the core of classical variational learning. Our proposed formulation -- FY variational learning -- includes as key ingredients new notions of FY free energy, FY evidence, FY evidence lower bound, and FY posterior. We derive alternating minimization and gradient backpropagation algorithms to compute (or lower bound) the FY evidence, which enables learning a wider class of models than previous variational formulations. This leads to generalized FY variants of classical algorithms, such as an FY expectation-maximization (FYEM) algorithm, and latent-variable models, such as an FY variational autoencoder (FYVAE). Our new methods are shown to be empirically competitive, often outperforming their classical counterparts, and most importantly, to have qualitatively novel features. For example, FYEM has an adaptively sparse E-step, while the FYVAE can support models with sparse observations and sparse posteriors.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
A New Rejection Sampling Approach to $k$-$\mathtt{means}$++ With Improved Trade-Offs
Authors:
Poojan Shah,
Shashwat Agrawal,
Ragesh Jaiswal
Abstract:
The $k$-$\mathtt{means}$++ seeding algorithm (Arthur & Vassilvitskii, 2007) is widely used in practice for the $k$-means clustering problem where the goal is to cluster a dataset $\mathcal{X} \subset \mathbb{R} ^d$ into $k$ clusters.
The popularity of this algorithm is due to its simplicity and provable guarantee of being $O(\log k)$ competitive with the optimal solution in expectation. However,…
▽ More
The $k$-$\mathtt{means}$++ seeding algorithm (Arthur & Vassilvitskii, 2007) is widely used in practice for the $k$-means clustering problem where the goal is to cluster a dataset $\mathcal{X} \subset \mathbb{R} ^d$ into $k$ clusters.
The popularity of this algorithm is due to its simplicity and provable guarantee of being $O(\log k)$ competitive with the optimal solution in expectation. However, its running time is $O(|\mathcal{X}|kd)$, making it expensive for large datasets.
In this work, we present a simple and effective rejection sampling based approach for speeding up $k$-$\mathtt{means}$++.
Our first method runs in time $\tilde{O}(\mathtt{nnz} (\mathcal{X}) + βk^2d)$ while still being $O(\log k )$ competitive in expectation. Here, $β$ is a parameter which is the ratio of the variance of the dataset to the optimal $k$-$\mathtt{means}$ cost in expectation and $\tilde{O}$ hides logarithmic factors in $k$ and $|\mathcal{X}|$.
Our second method presents a new trade-off between computational cost and solution quality. It incurs an additional scale-invariant factor of $ k^{-Ω( m/β)} \operatorname{Var} (\mathcal{X})$ in addition to the $O(\log k)$ guarantee of $k$-$\mathtt{means}$++ improving upon a result of (Bachem et al, 2016a) who get an additional factor of $m^{-1}\operatorname{Var}(\mathcal{X})$ while still running in time $\tilde{O}(\mathtt{nnz}(\mathcal{X}) + mk^2d)$. We perform extensive empirical evaluations to validate our theoretical results and to show the effectiveness of our approach on real datasets.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Optimization Landscapes Learned: Proxy Networks Boost Convergence in Physics-based Inverse Problems
Authors:
Girnar Goyal,
Philipp Holl,
Sweta Agrawal,
Nils Thuerey
Abstract:
Solving inverse problems in physics is central to understanding complex systems and advancing technologies in various fields. Iterative optimization algorithms, commonly used to solve these problems, often encounter local minima, chaos, or regions with zero gradients. This is due to their overreliance on local information and highly chaotic inverse loss landscapes governed by underlying partial di…
▽ More
Solving inverse problems in physics is central to understanding complex systems and advancing technologies in various fields. Iterative optimization algorithms, commonly used to solve these problems, often encounter local minima, chaos, or regions with zero gradients. This is due to their overreliance on local information and highly chaotic inverse loss landscapes governed by underlying partial differential equations (PDEs). In this work, we show that deep neural networks successfully replicate such complex loss landscapes through spatio-temporal trajectory inputs. They also offer the potential to control the underlying complexity of these chaotic loss landscapes during training through various regularization methods. We show that optimizing on network-smoothened loss landscapes leads to improved convergence in predicting optimum inverse parameters over conventional momentum-based optimizers such as BFGS on multiple challenging problems.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Analyzing Memorization in Large Language Models through the Lens of Model Attribution
Authors:
Tarun Ram Menta,
Susmit Agrawal,
Chirag Agarwal
Abstract:
Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on posthoc analyses, such as extracting memorized content or developing memorization metrics, without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate me…
▽ More
Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on posthoc analyses, such as extracting memorized content or developing memorization metrics, without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance. Using attribution techniques, we systematically intervene in the LLM architecture by bypassing attention modules at specific blocks while keeping other components like layer normalization and MLP transformations intact. We provide theorems analyzing our intervention mechanism from a mathematical view, bounding the difference in layer outputs with and without our attributions. Our theoretical and empirical analyses reveal that attention modules in deeper transformer blocks are primarily responsible for memorization, whereas earlier blocks are crucial for the models generalization and reasoning capabilities. We validate our findings through comprehensive experiments on different LLM families (Pythia and GPTNeo) and five benchmark datasets. Our insights offer a practical approach to mitigate memorization in LLMs while preserving their performance, contributing to safer and more ethical deployment in real world applications.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation
Authors:
Terrance Yu-Hao Chen,
Yulin Chen,
Pontus Soederhaell,
Sadrishya Agrawal,
Kateryna Shapovalenko
Abstract:
Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like…
▽ More
Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like speech perception. This study attempts to address these challenges by employing variational autoencoders (VAEs) for EEG data augmentation to improve data quality and applying a state-of-the-art (SOTA) sequence-to-sequence deep learning architecture, originally successful in electromyography (EMG) tasks, to EEG-based speech decoding. Additionally, we adapt this architecture for word classification tasks. Using the Brennan dataset, which contains EEG recordings of subjects listening to narrated speech, we preprocess the data and evaluate both classification and sequence-to-sequence models for EEG-to-words/sentences tasks. Our experiments show that VAEs have the potential to reconstruct artificial EEG data for augmentation. Meanwhile, our sequence-to-sequence model achieves more promising performance in generating sentences compared to our classification model, though both remain challenging tasks. These findings lay the groundwork for future research on EEG speech perception decoding, with possible extensions to speech production tasks such as silent or imagined speech.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
Prior2Posterior: Model Prior Correction for Long-Tailed Learning
Authors:
S Divakar Bhat,
Amit More,
Mudit Soni,
Surbhi Agrawal
Abstract:
Learning-based solutions for long-tailed recognition face difficulties in generalizing on balanced test datasets. Due to imbalanced data prior, the learned \textit{a posteriori} distribution is biased toward the most frequent (head) classes, leading to an inferior performance on the least frequent (tail) classes. In general, the performance can be improved by removing such a bias by eliminating th…
▽ More
Learning-based solutions for long-tailed recognition face difficulties in generalizing on balanced test datasets. Due to imbalanced data prior, the learned \textit{a posteriori} distribution is biased toward the most frequent (head) classes, leading to an inferior performance on the least frequent (tail) classes. In general, the performance can be improved by removing such a bias by eliminating the effect of imbalanced prior modeled using the number of class samples (frequencies). We first observe that the \textit{effective prior} on the classes, learned by the model at the end of the training, can differ from the empirical prior obtained using class frequencies. Thus, we propose a novel approach to accurately model the effective prior of a trained model using \textit{a posteriori} probabilities. We propose to correct the imbalanced prior by adjusting the predicted \textit{a posteriori} probabilities (Prior2Posterior: P2P) using the calculated prior in a post-hoc manner after the training, and show that it can result in improved model performance. We present theoretical analysis showing the optimality of our approach for models trained with naive cross-entropy loss as well as logit adjusted loss. Our experiments show that the proposed approach achieves new state-of-the-art (SOTA) on several benchmark datasets from the long-tail literature in the category of logit adjustment methods. Further, the proposed approach can be used to inspect any existing method to capture the \textit{effective prior} and remove any residual bias to improve its performance, post-hoc, without model retraining. We also show that by using the proposed post-hoc approach, the performance of many existing methods can be improved further.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
True mass and atmospheric composition of the non-transiting hot Jupiter HD 143105 b
Authors:
Luke Finnerty,
Yinzi Xin,
Jerry W. Xuan,
Julie Inglis,
Michael P Fitzgerald,
Shubh Agrawal,
Ashley Baker,
Geoffrey A. Blake,
Benjamin Calvin,
Sylvain Cetre,
Jacques-Robert Delorme,
Greg Doppman,
Daniel Echeverri,
Katelyn Horstman,
Chih-Chun Hsu,
Nemanja Jovanovic,
Joshua Liberman,
Ronald A. López,
Emily C. Martin,
Dimitri Mawet,
Evan Morris,
Jacklyn Pezzato-Rovner,
Jean-Baptiste Ruffio,
Ben Sappey,
Tobias Schofield
, et al. (6 additional authors not shown)
Abstract:
We present Keck/KPIC phase II $K$-band observations of the non-transiting hot Jupiter HD 143105 b. Using a cross-correlation approach, we make the first detection of the planetary atmosphere at $K_p = 185^{+11}_{-13}\rm km\ s^{-1}$ and an inferior conjunction time 2.5 hours before the previously-published ephemeris. The retrieved $K_p$ value, in combination with orbital period, mass of the host st…
▽ More
We present Keck/KPIC phase II $K$-band observations of the non-transiting hot Jupiter HD 143105 b. Using a cross-correlation approach, we make the first detection of the planetary atmosphere at $K_p = 185^{+11}_{-13}\rm km\ s^{-1}$ and an inferior conjunction time 2.5 hours before the previously-published ephemeris. The retrieved $K_p$ value, in combination with orbital period, mass of the host star, and lack of transit detection, gives an orbital inclination of $78^{\circ+2}_{-12}$ and a true planet mass of 1.23$\pm0.10\rm\ M_J$. While the equilibrium temperature of HD 143105 b is in the transition regime between non-inverted and inverted atmospheres, our analysis strongly prefers a non-inverted atmosphere. Retrieval analysis indicates the atmosphere of HD 143105 b is cloud-free to approximately 1 bar and dominated by H$_2$O absorption ($\log \rm H_2O_{MMR} = -3.9^{+0.8}_{-0.5}$), placing only an upper limit on the CO abundance ($\log \rm CO_{MMR} < -3.7$ at 95% confidence). We place no constraints on the abundances of Fe, Mg, or $^{13}$CO. From these abundances, we place an upper limit on the carbon-to-oxygen ratio for HD 143105 b, $\rm C/O < 0.2$ at 95% confidence, and find the atmospheric metallicity is approximately $0.1\times$ solar. The low metallicity may be responsible for the lack of a thermal inversion, which at the temperature of HD 143105 b would likely require significant opacity from TiO and/or VO. With these results, HD 143105 b joins the small number of non-transiting hot Jupiters with detected atmospheres.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
A Context-aware Framework for Translation-mediated Conversations
Authors:
José Pombal,
Sweta Agrawal,
Patrick Fernandes,
Emmanouil Zaranis,
André F. T. Martins
Abstract:
Automatic translation systems offer a powerful solution to bridge language barriers in scenarios where participants do not share a common language. However, these systems can introduce errors leading to misunderstandings and conversation breakdown. A key issue is that current systems fail to incorporate the rich contextual information necessary to resolve ambiguities and omitted details, resulting…
▽ More
Automatic translation systems offer a powerful solution to bridge language barriers in scenarios where participants do not share a common language. However, these systems can introduce errors leading to misunderstandings and conversation breakdown. A key issue is that current systems fail to incorporate the rich contextual information necessary to resolve ambiguities and omitted details, resulting in literal, inappropriate, or misaligned translations. In this work, we present a framework to improve large language model-based translation systems by incorporating contextual information in bilingual conversational settings during training and inference. We validate our proposed framework on two task-oriented domains: customer chat and user-assistant interaction. Across both settings, the system produced by our framework-TowerChat-consistently results in better translations than state-of-the-art systems like GPT-4o and TowerInstruct, as measured by multiple automatic translation quality metrics on several language pairs. We also show that the resulting model leverages context in an intended and interpretable way, improving consistency between the conveyed message and the generated translations.
△ Less
Submitted 29 June, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
Celestial $sw_{1+\infty}$ algebra in Einstein-Yang-Mills theory
Authors:
Shreyansh Agrawal,
Panagiotis Charalambous,
Laura Donnay
Abstract:
From a study of the subleading structure of the asymptotic equations of motion in Einstein-Yang-Mills theory, we construct charges that are conserved up to quadratic order in non-radiative vacuum. We then show that these higher spin charges obey the celestial $sw_{1+\infty}$ symmetry algebra found earlier from the OPE of positive-helicity conformally soft gluons and gravitons.
From a study of the subleading structure of the asymptotic equations of motion in Einstein-Yang-Mills theory, we construct charges that are conserved up to quadratic order in non-radiative vacuum. We then show that these higher spin charges obey the celestial $sw_{1+\infty}$ symmetry algebra found earlier from the OPE of positive-helicity conformally soft gluons and gravitons.
△ Less
Submitted 30 April, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Large sample scaling analysis of the Zig-Zag algorithm for Bayesian inference
Authors:
Sanket Agrawal,
Joris Bierkens,
Gareth O. Roberts
Abstract:
Piecewise deterministic Markov processes provide scalable methods for sampling from the posterior distributions in big data settings by admitting principled sub-sampling strategies that do not bias the output. An important example is the Zig-Zag process of [Ann. Stats. 47 (2019) 1288 - 1320] where clever sub-sampling has been shown to produce an essentially independent sample at a cost that does n…
▽ More
Piecewise deterministic Markov processes provide scalable methods for sampling from the posterior distributions in big data settings by admitting principled sub-sampling strategies that do not bias the output. An important example is the Zig-Zag process of [Ann. Stats. 47 (2019) 1288 - 1320] where clever sub-sampling has been shown to produce an essentially independent sample at a cost that does not scale with the size of the data. However, sub-sampling also leads to slower convergence and poor mixing of the process, a behaviour which questions the promised scalability of the algorithm. We provide a large sample scaling analysis of the Zig-Zag process and its sub-sampling versions in settings of parametric Bayesian inference. In the transient phase of the algorithm, we show that the Zig-Zag trajectories are well approximated by the solution to a system of ODEs. These ODEs possess a drift in the direction of decreasing KL-divergence between the assumed model and the true distribution and are explicitly characterized in the paper. In the stationary phase, we give weak convergence results for different versions of the Zig-Zag process. Based on our results, we estimate that for large data sets of size n, using suitable control variates with sub-sampling in Zig-Zag, the algorithm costs O(1) to obtain an essentially independent sample; a computational speed-up of O(n) over the canonical version of Zig-Zag and other traditional MCMC methods
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Multi-Agent Best Arm Identification in Stochastic Linear Bandits
Authors:
Sanjana Agrawal,
Saúl A. Blanco
Abstract:
We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we first consider multiple agents connected through a star network, interacting with a linear bandit instance in parallel. We then extend our analysis to arbitrary network topologies. The objective of the agents is to collaboratively identify the best arm…
▽ More
We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we first consider multiple agents connected through a star network, interacting with a linear bandit instance in parallel. We then extend our analysis to arbitrary network topologies. The objective of the agents is to collaboratively identify the best arm of the given bandit instance with the help of a central server while minimizing the probability of error in best arm estimation. To this end, we propose two algorithms, MaLinBAI-Star and MaLinBAI-Gen for star networks and networks with arbitrary structure, respectively. Both algorithms utilize the technique of G-optimal design along with the successive elimination based strategy where agents share their knowledge through a central server at each communication round. We demonstrate, both theoretically and empirically, that our algorithms achieve exponentially decaying probability of error in the allocated time budget. Furthermore, experimental results on both synthetic and real-world data validate the effectiveness of our algorithms over the state-of-the art existing multi-agent algorithms.
△ Less
Submitted 24 May, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Small-$x$ evolution of dipole amplitude in momentum space: forward--off-forward correspondence
Authors:
Sanskriti Agrawal,
Raktim Abir
Abstract:
We have shown that the small-$x$ evolution of the off-forward leading-log dipole scattering amplitudes, both pomeron and odderon, in the momentum space can be completely determined by the evolution of the respective forward amplitudes, with rescaled momenta. In position space, if there is translation symmetry (assumption of a large nucleus), the dipole cross section depends on the positions of qua…
▽ More
We have shown that the small-$x$ evolution of the off-forward leading-log dipole scattering amplitudes, both pomeron and odderon, in the momentum space can be completely determined by the evolution of the respective forward amplitudes, with rescaled momenta. In position space, if there is translation symmetry (assumption of a large nucleus), the dipole cross section depends on the positions of quarks and anti-quarks only through their separation. The present study is an equivalent proposition in the momentum space -- where translation symmetry in momentum bifurcates the amplitudes into two translationally symmetric functions along the ${\bf k}$ line in the ${\bf k}-{\bf Δ}$ plane. It also shows that high energy evolutions of dipole GTMDs can be achieved only by studying the evolution of dipole TMDs at small-$x$.
△ Less
Submitted 23 November, 2024; v1 submitted 19 November, 2024;
originally announced November 2024.
-
Value Imprint: A Technique for Auditing the Human Values Embedded in RLHF Datasets
Authors:
Ike Obi,
Rohan Pant,
Srishti Shekhar Agrawal,
Maham Ghazanfar,
Aaron Basiletti
Abstract:
LLMs are increasingly fine-tuned using RLHF datasets to align them with human preferences and values. However, very limited research has investigated which specific human values are operationalized through these datasets. In this paper, we introduce Value Imprint, a framework for auditing and classifying the human values embedded within RLHF datasets. To investigate the viability of this framework…
▽ More
LLMs are increasingly fine-tuned using RLHF datasets to align them with human preferences and values. However, very limited research has investigated which specific human values are operationalized through these datasets. In this paper, we introduce Value Imprint, a framework for auditing and classifying the human values embedded within RLHF datasets. To investigate the viability of this framework, we conducted three case study experiments by auditing the Anthropic/hh-rlhf, OpenAI WebGPT Comparisons, and Alpaca GPT-4-LLM datasets to examine the human values embedded within them. Our analysis involved a two-phase process. During the first phase, we developed a taxonomy of human values through an integrated review of prior works from philosophy, axiology, and ethics. Then, we applied this taxonomy to annotate 6,501 RLHF preferences. During the second phase, we employed the labels generated from the annotation as ground truth data for training a transformer-based machine learning model to audit and classify the three RLHF datasets. Through this approach, we discovered that information-utility values, including Wisdom/Knowledge and Information Seeking, were the most dominant human values within all three RLHF datasets. In contrast, prosocial and democratic values, including Well-being, Justice, and Human/Animal Rights, were the least represented human values. These findings have significant implications for developing language models that align with societal values and norms. We contribute our datasets to support further research in this area.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Authors:
Miguel Moura Ramos,
Tomás Almeida,
Daniel Vareta,
Filipe Azevedo,
Sweta Agrawal,
Patrick Fernandes,
André F. T. Martins
Abstract:
Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem -- the model rece…
▽ More
Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem -- the model receives a single score for the entire sentence. To address this, we propose a novel approach that leverages fine-grained, token-level quality assessments along with error severity levels using RL methods. Specifically, we use xCOMET, a state-of-the-art quality estimation system, as our token-level reward model. We conduct experiments on small and large translation datasets with standard encoder-decoder and large language models-based machine translation systems, comparing the impact of sentence-level versus fine-grained reward signals on translation quality. Our results show that training with token-level rewards improves translation quality across language pairs over baselines according to both automatic and human evaluation. Furthermore, token-level reward optimization improves training stability, evidenced by a steady increase in mean rewards over training epochs.
△ Less
Submitted 16 April, 2025; v1 submitted 8 November, 2024;
originally announced November 2024.
-
Microscopy of bosonic charge carriers in staggered magnetic fields
Authors:
Annabelle Bohrdt,
David Wei,
Daniel Adler,
Kritsana Srakaew,
Suchita Agrawal,
Pascal Weckesser,
Immanuel Bloch,
Fabian Grusdt,
Johannes Zeiher
Abstract:
The interplay of spin and charge degrees of freedom is believed to underlie various unresolved phenomena in strongly correlated systems. Quantum simulators based on neutral atoms provide an excellent testbed for investigating such phenomena and resolving their microscopic origins. Up to now, the majority of experimental and theoretical studies has focused on systems with fermionic exchange statist…
▽ More
The interplay of spin and charge degrees of freedom is believed to underlie various unresolved phenomena in strongly correlated systems. Quantum simulators based on neutral atoms provide an excellent testbed for investigating such phenomena and resolving their microscopic origins. Up to now, the majority of experimental and theoretical studies has focused on systems with fermionic exchange statistics. Here we expand the existing cold atom toolbox through the use of negative temperature states, enabling us to realize an antiferromagnetic, bosonic $t-J$ model in two spatial dimensions, subject to a strong staggered magnetic field in a quantum gas microscope. Through comparison of the spreading dynamics of a single hole in a Néel versus a spin-polarized initial state, we establish the relevance of memory effects resulting from the buildup of strong spin-charge correlations in the dynamics of charge carriers in antiferromagnets. We further numerically predict rich dynamics of pairs of doped holes, which we demonstrate to be bound by a similar memory effect, while their center-of-mass can expand freely. Our work paves the way for the systematic exploration of the effect of antiferromagnetic spin ordering on the properties of individual charge carriers as well as finite doping phases: Our study demonstrates that the staggered field can be used to single out the effect of antiferromagnetism and holds the prospect to prepare low-temperature states in the near future.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Authors:
Sudhanshu Agrawal,
Wonseok Jeon,
Mingu Lee
Abstract:
Speculative decoding is a powerful technique that attempts to circumvent the autoregressive constraint of modern Large Language Models (LLMs). The aim of speculative decoding techniques is to improve the average inference time of a large, target model without sacrificing its accuracy, by using a more efficient draft model to propose draft tokens which are then verified in parallel. The number of d…
▽ More
Speculative decoding is a powerful technique that attempts to circumvent the autoregressive constraint of modern Large Language Models (LLMs). The aim of speculative decoding techniques is to improve the average inference time of a large, target model without sacrificing its accuracy, by using a more efficient draft model to propose draft tokens which are then verified in parallel. The number of draft tokens produced in each drafting round is referred to as the draft length and is often a static hyperparameter chosen based on the acceptance rate statistics of the draft tokens. However, setting a static draft length can negatively impact performance, especially in scenarios where drafting is expensive and there is a high variance in the number of tokens accepted. Adaptive Entropy-based Draft Length (AdaEDL) is a simple, training and parameter-free criteria which allows for early stopping of the token drafting process by approximating a lower bound on the expected acceptance probability of the drafted token based on the currently observed entropy of the drafted logits. We show that AdaEDL consistently outperforms static draft-length speculative decoding by 10%-57% as well as other training-free draft-stopping techniques by upto 10% in a variety of settings and datasets. At the same time, we show that AdaEDL is more robust than these techniques and preserves performance in high-sampling-temperature scenarios. Since it is training-free, in contrast to techniques that rely on the training of dataset-specific draft-stopping predictors, AdaEDL can seamlessly be integrated into a variety of pre-existing LLM systems.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure
Authors:
Chaoyun Zhang,
Randolph Yao,
Si Qin,
Ze Li,
Shekhar Agrawal,
Binit R. Mishra,
Tri Tran,
Minghua Ma,
Qingwei Lin,
Murali Chintalapati,
Dongmei Zhang
Abstract:
The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored…
▽ More
The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored to recommending mitigation actions for unhealthy node in cloud systems to minimize virtual machine downtime and interruptions during unhealthy events. It employs double machine learning combined with causal forest to produce precise and reliable mitigation recommendations based solely on limited observational data collected from the historical unhealthy events. To enhance the causal inference model, Deoxys further incorporates a policy fallback mechanism based on model uncertainty and action overriding mechanisms to (i) improve the reliability of the system, and (ii) strike a good tradeoff between downtime reduction and resource utilization, thereby enhancing the overall system performance.
After deploying Deoxys in a large-scale cloud infrastructure at Microsoft, our observations demonstrate that Deoxys significantly reduces average VM downtime by 53% compared to a legacy policy, while leading to 49.5% lower VM interruption rate. This substantial improvement enhances the reliability and stability of cloud platforms, resulting in a seamless customer experience.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Centrality-aware Product Retrieval and Ranking
Authors:
Hadeel Saadany,
Swapnil Bhosale,
Samarth Agrawal,
Diptesh Kanojia,
Constantin Orasan,
Zhe Wu
Abstract:
This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to users' search queries. Ambiguity and complexity of user queries often lead to a mismatch between the user's intent and retrieved product titles or documents. Recent approaches have proposed the use of Transformer-based models, which need millions of annotated query-title…
▽ More
This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to users' search queries. Ambiguity and complexity of user queries often lead to a mismatch between the user's intent and retrieved product titles or documents. Recent approaches have proposed the use of Transformer-based models, which need millions of annotated query-title pairs during the pre-training stage, and this data often does not take user intent into account. To tackle this, we curate samples from existing datasets at eBay, manually annotated with buyer-centric relevance scores and centrality scores, which reflect how well the product title matches the users' intent. We introduce a User-intent Centrality Optimization (UCO) approach for existing models, which optimises for the user intent in semantic product search. To that end, we propose a dual-loss based optimisation to handle hard negatives, i.e., product titles that are semantically relevant but do not reflect the user's intent. Our contributions include curating challenging evaluation sets and implementing UCO, resulting in significant product ranking efficiency improvements observed for different evaluation metrics. Our work aims to ensure that the most buyer-centric titles for a query are ranked higher, thereby, enhancing the user experience on e-commerce platforms.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Findings of the WMT 2024 Shared Task on Chat Translation
Authors:
Wafaa Mohammed,
Sweta Agrawal,
M. Amin Farajian,
Vera Cabarrão,
Bryan Eikema,
Ana C. Farinha,
José G. C. de Souza
Abstract:
This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversation context in translation quality and evaluation. We also include two new language pairs: English-Korean and English-Dutch, in addition to the set of language…
▽ More
This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversation context in translation quality and evaluation. We also include two new language pairs: English-Korean and English-Dutch, in addition to the set of language pairs from previous editions: English-German, English-French, and English-Brazilian Portuguese. We received 22 primary submissions and 32 contrastive submissions from eight teams, with each language pair having participation from at least three teams. We evaluated the systems comprehensively using both automatic metrics and human judgments via a direct assessment framework. The official rankings for each language pair were determined based on human evaluation scores, considering performance in both translation directions--agent and customer. Our analysis shows that while the systems excelled at translating individual turns, there is room for improvement in overall conversation-level translation quality.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
Authors:
Emmanouil Zaranis,
Giuseppe Attanasio,
Sweta Agrawal,
André F. T. Martins
Abstract:
Quality estimation (QE)-the automatic assessment of translation quality-has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. While QE metrics have been optimized to align with human judgments, whether they encode social biases has been largely overlooked. Biased QE risks favoring certain demographic groups over others, e.g., by…
▽ More
Quality estimation (QE)-the automatic assessment of translation quality-has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. While QE metrics have been optimized to align with human judgments, whether they encode social biases has been largely overlooked. Biased QE risks favoring certain demographic groups over others, e.g., by exacerbating gaps in visibility and usability. This paper defines and investigates gender bias of QE metrics and discusses its downstream implications for machine translation (MT). Experiments with state-of-the-art QE metrics across multiple domains, datasets, and languages reveal significant bias. When a human entity's gender in the source is undisclosed, masculine-inflected translations score higher than feminine-inflected ones, and gender-neutral translations are penalized. Even when contextual cues disambiguate gender, using context-aware QE metrics leads to more errors in selecting the correct translation inflection for feminine referents than for masculine ones. Moreover, a biased QE metric affects data filtering and quality-aware decoding. Our findings underscore the need for a renewed focus on developing and evaluating QE metrics centered on gender.
△ Less
Submitted 2 June, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
Authors:
Sweta Agrawal,
José G. C. de Souza,
Ricardo Rei,
António Farinhas,
Gonçalo Faria,
Patrick Fernandes,
Nuno M Guerreiro,
Andre Martins
Abstract:
Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the othe…
▽ More
Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the other hand, can induce preferences, but they might not match human expectations perfectly. In this paper, we propose an approach that leverages the best of both worlds. We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems and evaluate the ability of current automatic metrics to recover these preferences. We then use this analysis to curate a new dataset, MT-Pref (metric induced translation preference) dataset, which comprises 18k instances covering 18 language directions, using texts sourced from multiple domains post-2022. We show that aligning TOWER models on MT-Pref significantly improves translation quality on WMT23 and FLORES benchmarks.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.