-
LIGO/Virgo/KAGRA neutron star merger candidate S250206dm: Zwicky Transient Facility observations
Authors:
Tomás Ahumada,
Shreya Anand,
Mattia Bulla,
Vaidehi Gupta,
Mansi Kasliwal,
Robert Stein,
Viraj Karambelkar,
Eric C. Bellm,
Theophile Jegou du Laz,
Michael W. Coughlin,
Igor Andreoni,
Smaranika Banerjee,
Aleksandra Bochenek,
K-Ryan Hinds,
Lei Hu,
Antonella Palmese,
Daniel Perley,
Natalya Pletskova,
Anirudh Salgundi,
Avinash Singh,
Jesper Sollerman,
Vishwajeet Swain,
Avery Wold,
Varun Bhalerao,
S. Bradley Cenko
, et al. (27 additional authors not shown)
Abstract:
We present the searches conducted with the Zwicky Transient Facility (ZTF) in response to S250206dm, a bona fide event with a false alarm rate of one in 25 years, detected by the International Gravitational Wave Network (IGWN). Although the event is significant, the nature of the compact objects involved remains unclear, with at least one likely neutron star. ZTF covered 68% of the localization re…
▽ More
We present the searches conducted with the Zwicky Transient Facility (ZTF) in response to S250206dm, a bona fide event with a false alarm rate of one in 25 years, detected by the International Gravitational Wave Network (IGWN). Although the event is significant, the nature of the compact objects involved remains unclear, with at least one likely neutron star. ZTF covered 68% of the localization region, though we did not identify any likely optical counterpart. We describe the ZTF strategy, potential candidates, and the observations that helped rule out candidates, including sources circulated by other collaborations. Similar to Ahumada et al. 2024, we perform a frequentist analysis, using simsurvey, as well as Bayesian analysis, using nimbus, to quantify the efficiency of our searches. We find that, given the nominal distance to this event of 373$\pm$104 Mpc, our efficiencies are above 10% for KNe brighter than $-17.5$ absolute magnitude. Assuming the optical counterpart known as kilonova (KN) lies within the ZTF footprint, our limits constrain the brightest end of the KN parameter space. Through dedicated radiative transfer simulations of KNe from binary neutron star (BNS) and black hole-neutron star (BHNS) mergers, we exclude parts of the BNS KN parameter space. Up to 35% of the models with high wind ejecta mass ($M_{\rm wind} \approx 0.13$ M$_{\odot}$) are ruled out when viewed face-on ($\cosθ_{\rm obs} = 1.0$). Finally, we present a joint analysis using the combined coverage from ZTF and the Gravitational Wave Multimessenger Dark Energy Camera Survey (GW-MMADS). The joint observations cover 73% of the localization region, and the combined efficiency has a stronger impact on rising and slowly fading models, allowing us to rule out 55% of the high-mass KN models viewed face-on.
△ Less
Submitted 30 June, 2025;
originally announced July 2025.
-
First results from the UTMOST-NS pulsar timing programme
Authors:
L. Dunn,
C. Flynn,
M. Bailes,
Y. S. C. Lee,
G. Howitt,
A. Melatos,
V. Gupta,
A. Mandlik,
A. Deller
Abstract:
The UTMOST-NS pulsar timing programme operated at the Molonglo Observatory Synthesis Telescope from April 2021 to June 2023, observing 173 pulsars with an average cadence of 50 pulsars per day. An overview of the programme is presented, detailing the hardware, software, and observing strategy. Pulsar timing results are discussed, focusing on timing noise and glitches. It is shown that the scaling…
▽ More
The UTMOST-NS pulsar timing programme operated at the Molonglo Observatory Synthesis Telescope from April 2021 to June 2023, observing 173 pulsars with an average cadence of 50 pulsars per day. An overview of the programme is presented, detailing the hardware, software, and observing strategy. Pulsar timing results are discussed, focusing on timing noise and glitches. It is shown that the scaling of residuals due to timing noise with pulsar parameters and observing timespan is consistent with earlier studies, and that the recovered timing noise parameters remain consistent as the observing timespan is increased. Second frequency derivatives are investigated, and it is shown that the uncertainty on $\ddotν$ is sensitive to the frequency cutoff in the timing noise model, varying by three-fold approximately depending on whether Fourier modes with frequency lower than the reciprocal of the observing timespan are included. We measure 39 non-zero values of $\ddotν$ when considering both models with and without low-frequency modes. An analytic scaling relating anomalous braking indices to timing noise amplitude is also validated. Glitches in the sample are discussed, including three detected by an ``online'' glitch detection pipeline using a hidden Markov model (HMM). In total 17 glitches are discussed, one of which, in PSR J1902+0615, has not been reported elsewhere. An ``offline'' glitch search pipeline using the HMM framework is used to search for previously undetected glitches. Systematic upper limits are set on the size of undetected glitches. The mean upper limit is $Δν^{90\%}/ν= 6.3 \times 10^{-9}$ at 90\% confidence.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Harnessing Piezoelectric Shear Actuators for Vibration Control in Sandwich Beams
Authors:
Mark Baken,
Vivek Gupta,
Bas Jansen,
S. Hassan HosseinNia
Abstract:
Our study found that integrating shear piezo-transducers inside the beam offers a compact and efficient solution that enables localized damping control without compromising structural integrity. However, the conventional approach of placing the piezos outside the substrate faces challenges and limited accessibility to industrial applications. We determine damping performance for long and slender s…
▽ More
Our study found that integrating shear piezo-transducers inside the beam offers a compact and efficient solution that enables localized damping control without compromising structural integrity. However, the conventional approach of placing the piezos outside the substrate faces challenges and limited accessibility to industrial applications. We determine damping performance for long and slender sandwich beam structures utilizing active vibration control by internally placed piezoelectric shear sensors and actuators. Experimental and numerical results are presented for a clamped-free sandwich beam structure constructed with two stainless steel facings composed of a core layer of foam and a piezoelectric shear-actuator and sensor. This approach of internal actuator and sensor tends to tackle the problems within (high-tech) systems, i.e. mechanical vibrations, a limited amount of design volume, and vulnerability of externally placed piezoelectric transducers to outside conditions. By this new internal sensor-actuator approach, this study addresses a significant gap in the literature. The location of the sensor and actuator has been defined by numerical investigation of the \textit{modal shear strain} and the \textit{effective electro-mechanical coupling coefficient}. The frequency response of the sandwich beam structure has been evaluated using both numerical and experimental investigation. Positive Position Feedback has been employed on the numerical response to simulate the damping performance for the fundamental mode. Different controller gains have been used to analyze the trade-off between effective resonance suppression and increased low-frequency gain. The tip vibrations at the fundamental mode have been reduced from 5.01 mm to 0.34 mm amplitude at steady state, which represents a significant reduction.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
PhotonSplat: 3D Scene Reconstruction and Colorization from SPAD Sensors
Authors:
Sai Sri Teja,
Sreevidya Chintalapati,
Vinayak Gupta,
Mukund Varma T,
Haejoon Lee,
Aswin Sankaranarayanan,
Kaushik Mitra
Abstract:
Advances in 3D reconstruction using neural rendering have enabled high-quality 3D capture. However, they often fail when the input imagery is corrupted by motion blur, due to fast motion of the camera or the objects in the scene. This work advances neural rendering techniques in such scenarios by using single-photon avalanche diode (SPAD) arrays, an emerging sensing technology capable of sensing i…
▽ More
Advances in 3D reconstruction using neural rendering have enabled high-quality 3D capture. However, they often fail when the input imagery is corrupted by motion blur, due to fast motion of the camera or the objects in the scene. This work advances neural rendering techniques in such scenarios by using single-photon avalanche diode (SPAD) arrays, an emerging sensing technology capable of sensing images at extremely high speeds. However, the use of SPADs presents its own set of unique challenges in the form of binary images, that are driven by stochastic photon arrivals. To address this, we introduce PhotonSplat, a framework designed to reconstruct 3D scenes directly from SPAD binary images, effectively navigating the noise vs. blur trade-off. Our approach incorporates a novel 3D spatial filtering technique to reduce noise in the renderings. The framework also supports both no-reference using generative priors and reference-based colorization from a single blurry image, enabling downstream applications such as segmentation, object detection and appearance editing tasks. Additionally, we extend our method to incorporate dynamic scene representations, making it suitable for scenes with moving objects. We further contribute PhotonScenes, a real-world multi-view dataset captured with the SPAD sensors.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
The Low Mass Dwarf Host Galaxy of Non-Repeating FRB 20230708A
Authors:
August R. Muller,
Alexa C. Gordon,
Stuart D. Ryder,
Alexandra G. Mannings,
J. Xavier Prochaska,
Keith W. Bannister,
A. Bera,
N. D. R. Bhat,
Adam T. Deller,
Wen-fai Fong,
Marcin Glowacki,
Vivek Gupta,
J. N. Jahns-Schindler,
C. W. James,
Regina A. Jorgenson,
Lachlan Marnoch,
R. M. Shannon,
Nicolas Tejos,
Ziteng Wang
Abstract:
We present Very Large Telescope/X-Shooter spectroscopy for the host galaxies of 12 fast radio bursts (FRBs) detected by the Australian SKA Pathfinder (ASKAP) observed through the ESO Large Programme "FURBY", which imposes strict selection criteria on the included FRBs and their host galaxies to produce a homogeneous and well-defined sample. We describe the data reduction and analysis of these spec…
▽ More
We present Very Large Telescope/X-Shooter spectroscopy for the host galaxies of 12 fast radio bursts (FRBs) detected by the Australian SKA Pathfinder (ASKAP) observed through the ESO Large Programme "FURBY", which imposes strict selection criteria on the included FRBs and their host galaxies to produce a homogeneous and well-defined sample. We describe the data reduction and analysis of these spectra and report their redshifts, line-emission fluxes, and derived host properties. From the present sample, this paper focuses on the faint host of FRB ($m_R = 22.53 \pm 0.02$) identified at low redshift ($z=0.1050$). This indicates an intrinsically very low-luminosity galaxy ($L \approx 10^8 L_\odot$), making it the lowest-luminosity non-repeating FRB host to date by a factor of $\sim 3$, and slightly dimmer than the lowest-luminosity host for repeating FRBs. Our SED fitting analysis reveals a low stellar mass ($M_* \approx 10^{8.0} M_\odot$), low star formation rate (${\rm SFR} \approx 0.04 M_\odot \rm yr^{-1}$), and very low metallicity ($12+\log(\text{O}/\text{H})\sim(7.99-8.3)$), distinct from the more massive galaxies ($\log(M/M_\odot) \sim 10$) that are commonly identified for non-repeating FRBs. Its discovery demonstrates that FRBs can arise in among the faintest, metal-poor galaxies of the universe. In turn, this suggests that at least one FRB progenitor channel must include stars (or their remnants) created in very low metallicity environments. This indicates better prospects for detecting FRBs from the high-$z$ universe where young, low-mass galaxies proliferate.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
PRAISE: Enhancing Product Descriptions with LLM-Driven Structured Insights
Authors:
Adnan Qidwai,
Srija Mukhopadhyay,
Prerana Khatiwada,
Dan Roth,
Vivek Gupta
Abstract:
Accurate and complete product descriptions are crucial for e-commerce, yet seller-provided information often falls short. Customer reviews offer valuable details but are laborious to sift through manually. We present PRAISE: Product Review Attribute Insight Structuring Engine, a novel system that uses Large Language Models (LLMs) to automatically extract, compare, and structure insights from custo…
▽ More
Accurate and complete product descriptions are crucial for e-commerce, yet seller-provided information often falls short. Customer reviews offer valuable details but are laborious to sift through manually. We present PRAISE: Product Review Attribute Insight Structuring Engine, a novel system that uses Large Language Models (LLMs) to automatically extract, compare, and structure insights from customer reviews and seller descriptions. PRAISE provides users with an intuitive interface to identify missing, contradictory, or partially matching details between these two sources, presenting the discrepancies in a clear, structured format alongside supporting evidence from reviews. This allows sellers to easily enhance their product listings for clarity and persuasiveness, and buyers to better assess product reliability. Our demonstration showcases PRAISE's workflow, its effectiveness in generating actionable structured insights from unstructured reviews, and its potential to significantly improve the quality and trustworthiness of e-commerce product catalogs.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Large Scalable Cross-Domain Graph Neural Networks for Personalized Notification at LinkedIn
Authors:
Shihai He,
Julie Choi,
Tianqi Li,
Zhiwei Ding,
Peng Du,
Priya Bannur,
Franco Liang,
Fedor Borisyuk,
Padmini Jaikumar,
Xiaobing Xue,
Viral Gupta
Abstract:
Notification recommendation systems are critical to driving user engagement on professional platforms like LinkedIn. Designing such systems involves integrating heterogeneous signals across domains, capturing temporal dynamics, and optimizing for multiple, often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework for modeling complex interactions in such environments.…
▽ More
Notification recommendation systems are critical to driving user engagement on professional platforms like LinkedIn. Designing such systems involves integrating heterogeneous signals across domains, capturing temporal dynamics, and optimizing for multiple, often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework for modeling complex interactions in such environments. In this paper, we present a cross-domain GNN-based system deployed at LinkedIn that unifies user, content, and activity signals into a single, large-scale graph. By training on this cross-domain structure, our model significantly outperforms single-domain baselines on key tasks, including click-through rate (CTR) prediction and professional engagement. We introduce architectural innovations including temporal modeling and multi-task learning, which further enhance performance. Deployed in LinkedIn's notification system, our approach led to a 0.10% lift in weekly active users and a 0.62% improvement in CTR. We detail our graph construction process, model design, training pipeline, and both offline and online evaluations. Our work demonstrates the scalability and effectiveness of cross-domain GNNs in real-world, high-impact applications.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
A nanosecond-duration radio pulse originating from the defunct Relay 2 satellite
Authors:
C. W. James,
A. T. Deller,
T. Dial,
M. Glowacki,
S. J. Tingay,
K. W. Bannister,
A. Bera,
N. D. R. Bhat,
R. D. Ekers,
V. Gupta,
A. Jaini,
J. Morgan,
J. N. Jahns-Schindler,
R. M. Shannon,
M. Sukhov,
J. Tuthill,
Z. Wang
Abstract:
We report the detection of a burst of emission over a 695.5 MHz-1031.5 MHz bandwidth by the Australian Square Kilometre Array Pathfinder, ASKAP. The burst was localised through analysis of near-field time delays to the long-decommissioned Relay 2 satellite, and exhibited a dispersion measure of $2.26 \cdot 10^{-5}$ pc cm$^{-3}$ -- 69.7 TECU, consistent with expectations for a single pass through t…
▽ More
We report the detection of a burst of emission over a 695.5 MHz-1031.5 MHz bandwidth by the Australian Square Kilometre Array Pathfinder, ASKAP. The burst was localised through analysis of near-field time delays to the long-decommissioned Relay 2 satellite, and exhibited a dispersion measure of $2.26 \cdot 10^{-5}$ pc cm$^{-3}$ -- 69.7 TECU, consistent with expectations for a single pass through the ionosphere. After coherent dedispersion, the burst was determined to be less than 30 ns in width, with an average flux density of at least 300 kJy. We consider an electrostatic discharge (ESD) or plasma discharge following a micrometeoroid impact to be plausible explanations for the burst. ESDs have previously been observed with the Arecibo radio telescope, but on 1000 times longer timescales. Our observation opens new possibilities for the remote sensing of ESD, which poses a serious threat to spacecraft, and reveals a new source of false events for observations of astrophysical transients.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning
Authors:
Kushagra Dixit,
Abhishek Rajgaria,
Harshavardhan Kalalbandi,
Dan Roth,
Vivek Gupta
Abstract:
Temporal Table Reasoning is a critical challenge for Large Language Models (LLMs), requiring effective prompting techniques to extract relevant insights. Despite existence of multiple prompting methods, their impact on table reasoning remains largely unexplored. Furthermore, the performance of these models varies drastically across different table and context structures, making it difficult to det…
▽ More
Temporal Table Reasoning is a critical challenge for Large Language Models (LLMs), requiring effective prompting techniques to extract relevant insights. Despite existence of multiple prompting methods, their impact on table reasoning remains largely unexplored. Furthermore, the performance of these models varies drastically across different table and context structures, making it difficult to determine an optimal approach. This work investigates multiple prompting technique across diverse table types to determine optimal approaches for different scenarios. We find that performance varies based on entity type, table structure, requirement of additional context and question complexity, with NO single method consistently outperforming others. To mitigate these challenges, we introduce SEAR, an adaptive prompting framework inspired by human reasoning that dynamically adjusts based on context characteristics and integrates a structured reasoning. Our results demonstrate that SEAR achieves superior performance across all table types compared to other baseline prompting techniques. Additionally, we explore the impact of table structure refactoring, finding that a unified representation enhances model's reasoning.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Mapping the Spatial Distribution of Fast Radio Bursts within their Host Galaxies
Authors:
Alexa C. Gordon,
Wen-fai Fong,
Adam T. Deller,
Lachlan Marnoch,
Sungsoon Lim,
Eric W. Peng,
Keith W. Bannister,
Apurba Bera,
N. D. R. Bhat,
Tyson Dial,
Yuxin Dong,
Tarraneh Eftekhari,
Marcin Glowacki,
Kelly Gourdji,
Vivek Gupta,
Joscha N. Jahns-Schindler,
Akhil Jaini,
Charles D. Kilpatrick,
Chang Liu,
J. Xavier Prochaska,
Stuart D. Ryder,
Ryan M. Shannon,
Sunil Simha,
Nicolas Tejos,
Yuanming Wang
, et al. (1 additional authors not shown)
Abstract:
We present deep optical and near-infrared observations of the host galaxies of 34 fast radio bursts (FRBs) detected by the Commensal Real-time ASKAP Fast Transient (CRAFT) survey on the Australian SKA Pathfinder (ASKAP) to compare the locations of FRBs relative to their host light distributions. Incorporating three additional FRBs from the literature, for a total of four repeating and 33 apparentl…
▽ More
We present deep optical and near-infrared observations of the host galaxies of 34 fast radio bursts (FRBs) detected by the Commensal Real-time ASKAP Fast Transient (CRAFT) survey on the Australian SKA Pathfinder (ASKAP) to compare the locations of FRBs relative to their host light distributions. Incorporating three additional FRBs from the literature, for a total of four repeating and 33 apparently non-repeating FRBs, we determine their projected galactocentric offsets and find a median of $ 4.2^{+5.7}_{-2.5}$ kpc ($1.0^{+1.5}_{-0.6}r_e$). We model their host surface brightness profiles and develop synthetic spatial distributions of their globular clusters based on host properties. We calculate the likelihood the observed location of each FRB is consistent with the smooth light of its host galaxy, residual (primarily spiral) substructure, or globular cluster distributions. The majority of FRBs favor locations within the disks of their galaxies, while only 7-13\% favor a globular cluster origin, primarily those with galactocentric offsets $\gtrsim3r_e$. At $z<0.15$, where spiral structure is apparent in 86\% of our sample of FRB hosts, we find $\approx 20-46\%$ of FRBs favor an association with spiral arms. Assuming FRBs derive from magnetars, our results support multiple formation channels with the majority of progenitors associated with massive stars and a minority formed through dynamical channels. However, the moderate fraction of FRBs associated with spiral structure indicates that high star formation efficiency of the youngest and most massive stars is not a predominant driver in the production of FRB progenitors.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning
Authors:
Atharv Kulkarni,
Kushagra Dixit,
Vivek Srikumar,
Dan Roth,
Vivek Gupta
Abstract:
Temporal tabular question answering presents a significant challenge for Large Language Models (LLMs), requiring robust reasoning over structured data, which is a task where traditional prompting methods often fall short. These methods face challenges such as memorization, sensitivity to table size, and reduced performance on complex queries. To overcome these limitations, we introduce TempTabQA-C…
▽ More
Temporal tabular question answering presents a significant challenge for Large Language Models (LLMs), requiring robust reasoning over structured data, which is a task where traditional prompting methods often fall short. These methods face challenges such as memorization, sensitivity to table size, and reduced performance on complex queries. To overcome these limitations, we introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations, alongside a symbolic intermediate representation that transforms tables into database schemas. This structured approach allows LLMs to generate and execute SQL queries, enhancing generalization and mitigating biases. By incorporating adaptive few-shot prompting with contextually tailored examples, our method achieves superior robustness, scalability, and performance. Experimental results consistently highlight improvements across key challenges, setting a new benchmark for robust temporal reasoning with LLMs.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
Authors:
Manan Suri,
Puneet Mathur,
Nedim Lipka,
Franck Dernoncourt,
Ryan A. Rossi,
Vivek Gupta,
Dinesh Manocha
Abstract:
Flowcharts are a critical tool for visualizing decision-making processes. However, their non-linear structure and complex visual-textual relationships make it challenging to interpret them using LLMs, as vision-language models frequently hallucinate nonexistent connections and decision paths when analyzing these diagrams. This leads to compromised reliability for automated flowchart processing in…
▽ More
Flowcharts are a critical tool for visualizing decision-making processes. However, their non-linear structure and complex visual-textual relationships make it challenging to interpret them using LLMs, as vision-language models frequently hallucinate nonexistent connections and decision paths when analyzing these diagrams. This leads to compromised reliability for automated flowchart processing in critical domains such as logistics, health, and engineering. We introduce the task of Fine-grained Flowchart Attribution, which traces specific components grounding a flowchart referring LLM response. Flowchart Attribution ensures the verifiability of LLM predictions and improves explainability by linking generated responses to the flowchart's structure. We propose FlowPathAgent, a neurosymbolic agent that performs fine-grained post hoc attribution through graph-based reasoning. It first segments the flowchart, then converts it into a structured symbolic graph, and then employs an agentic approach to dynamically interact with the graph, to generate attribution paths. Additionally, we present FlowExplainBench, a novel benchmark for evaluating flowchart attributions across diverse styles, domains, and question types. Experimental results show that FlowPathAgent mitigates visual hallucinations in LLM answers over flowchart QA, outperforming strong baselines by 10-14% on our proposed FlowExplainBench dataset.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
REDDIX-NET: A Novel Dataset and Benchmark for Moderating Online Explicit Services
Authors:
MSVPJ Sathvik,
Manan Roy Choudhury,
Rishita Agarwal,
Sathwik Narkedimilli,
Vivek Gupta
Abstract:
The rise of online platforms has enabled covert illicit activities, including online prostitution, to pose challenges for detection and regulation. In this study, we introduce REDDIX-NET, a novel benchmark dataset specifically designed for moderating online sexual services and going beyond traditional NSFW filters. The dataset is derived from thousands of web-scraped NSFW posts on Reddit and categ…
▽ More
The rise of online platforms has enabled covert illicit activities, including online prostitution, to pose challenges for detection and regulation. In this study, we introduce REDDIX-NET, a novel benchmark dataset specifically designed for moderating online sexual services and going beyond traditional NSFW filters. The dataset is derived from thousands of web-scraped NSFW posts on Reddit and categorizes users into six behavioral classes reflecting different service offerings and user intentions. We evaluate the classification performance of state-of-the-art large language models (GPT-4, LlaMA 3.3-70B-Instruct, Gemini 1.5 Flash, Mistral 8x7B, Qwen 2.5 Turbo, Claude 3.5 Haiku) using advanced quantitative metrics, finding promising results with models like GPT-4 and Gemini 1.5 Flash. Beyond classification, we conduct sentiment and comment analysis, leveraging LLM and PLM-based approaches and metadata extraction to uncover behavioral and temporal patterns. These analyses reveal peak engagement times and distinct user interaction styles across categories. Our findings provide critical insights into AI-driven moderation and enforcement, offering a scalable framework for platforms to combat online prostitution and associated harms.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Map&Make: Schema Guided Text to Table Generation
Authors:
Naman Ahuja,
Fenil Bardoliya,
Chitta Baral,
Vivek Gupta
Abstract:
Transforming dense, detailed, unstructured text into an interpretable and summarised table, also colloquially known as Text-to-Table generation, is an essential task for information retrieval. Current methods, however, miss out on how and what complex information to extract; they also lack the ability to infer data from the text. In this paper, we introduce a versatile approach, Map&Make, which "d…
▽ More
Transforming dense, detailed, unstructured text into an interpretable and summarised table, also colloquially known as Text-to-Table generation, is an essential task for information retrieval. Current methods, however, miss out on how and what complex information to extract; they also lack the ability to infer data from the text. In this paper, we introduce a versatile approach, Map&Make, which "dissects" text into propositional atomic statements. This facilitates granular decomposition to extract the latent schema. The schema is then used to populate the tables that capture the qualitative nuances and the quantitative facts in the original text. Our approach is tested against two challenging datasets, Rotowire, renowned for its complex and multi-table schema, and Livesum, which demands numerical aggregation. By carefully identifying and correcting hallucination errors in Rotowire, we aim to achieve a cleaner and more reliable benchmark. We evaluate our method rigorously on a comprehensive suite of comparative and referenceless metrics. Our findings demonstrate significant improvement results across both datasets with better interpretability in Text-to-Table generation. Moreover, through detailed ablation studies and analyses, we investigate the factors contributing to superior performance and validate the practicality of our framework in structured summarization tasks.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation
Authors:
Vihang Pancholi,
Jainit Bafna,
Tejas Anvekar,
Manish Shrivastava,
Vivek Gupta
Abstract:
Evaluating tables qualitatively and quantitatively poses a significant challenge, as standard metrics often overlook subtle structural and content-level discrepancies. To address this, we propose a rubric-based evaluation framework that integrates multi-level structural descriptors with fine-grained contextual signals, enabling more precise and consistent table comparison. Building on this, we int…
▽ More
Evaluating tables qualitatively and quantitatively poses a significant challenge, as standard metrics often overlook subtle structural and content-level discrepancies. To address this, we propose a rubric-based evaluation framework that integrates multi-level structural descriptors with fine-grained contextual signals, enabling more precise and consistent table comparison. Building on this, we introduce TabXEval, an eXhaustive and eXplainable two-phase evaluation framework. TabXEval first aligns reference and predicted tables structurally via TabAlign, then performs semantic and syntactic comparison using TabCompare, offering interpretable and granular feedback. We evaluate TabXEval on TabXBench, a diverse, multi-domain benchmark featuring realistic table perturbations and human annotations. A sensitivity-specificity analysis further demonstrates the robustness and explainability of TabXEval across varied table tasks. Code and data are available at https://coral-lab-asu.github.io/tabxeval/
△ Less
Submitted 1 June, 2025; v1 submitted 28 May, 2025;
originally announced May 2025.
-
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
Authors:
Shikhhar Siingh,
Abhinav Rawat,
Chitta Baral,
Vivek Gupta
Abstract:
Publicly significant images from events hold valuable contextual information, crucial for journalism and education. However, existing methods often struggle to extract this relevance accurately. To address this, we introduce GETReason (Geospatial Event Temporal Reasoning), a framework that moves beyond surface-level image descriptions to infer deeper contextual meaning. We propose that extracting…
▽ More
Publicly significant images from events hold valuable contextual information, crucial for journalism and education. However, existing methods often struggle to extract this relevance accurately. To address this, we introduce GETReason (Geospatial Event Temporal Reasoning), a framework that moves beyond surface-level image descriptions to infer deeper contextual meaning. We propose that extracting global event, temporal, and geospatial information enhances understanding of an image's significance. Additionally, we introduce GREAT (Geospatial Reasoning and Event Accuracy with Temporal Alignment), a new metric for evaluating reasoning-based image understanding. Our layered multi-agent approach, assessed using a reasoning-weighted metric, demonstrates that meaningful insights can be inferred, effectively linking images to their broader event context.
△ Less
Submitted 2 June, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
Authors:
Prasham Yatinkumar Titiya,
Jainil Trivedi,
Chitta Baral,
Vivek Gupta
Abstract:
Multimodal tables those that integrate semi structured data with visual elements such as charts and maps are ubiquitous across real world domains, yet they pose a formidable challenge to current vision language models (VLMs). While Large Language models (LLMs) and VLMs have demonstrated strong capabilities in text and image understanding, their performance on complex, real world multimodal table r…
▽ More
Multimodal tables those that integrate semi structured data with visual elements such as charts and maps are ubiquitous across real world domains, yet they pose a formidable challenge to current vision language models (VLMs). While Large Language models (LLMs) and VLMs have demonstrated strong capabilities in text and image understanding, their performance on complex, real world multimodal table reasoning remains unexplored. To bridge this gap, we introduce MMTBENCH (Multimodal Table Benchmark), a benchmark consisting of 500 real world multimodal tables drawn from diverse real world sources, with a total of 4021 question answer pairs. MMTBENCH questions cover four question types (Explicit, Implicit, Answer Mention, and Visual Based), five reasoning types (Mathematical, Extrema Identification, Fact Verification, Vision Based, and Others), and eight table types (Single/Multiple Entity, Maps and Charts with Entities, Single/Multiple Charts, Maps, and Visualizations). Extensive evaluation of state of the art models on all types reveals substantial performance gaps, particularly on questions requiring visual-based reasoning and multi-step inference. These findings show the urgent need for improved architectures that more tightly integrate vision and language processing. By providing a challenging, high-quality resource that mirrors the complexity of real-world tasks, MMTBENCH underscores its value as a resource for future research on multimodal tables.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective
Authors:
Krishna Singh Rajput,
Tejas Anvekar,
Chitta Baral,
Vivek Gupta
Abstract:
Recent advances in multimodal question answering have primarily focused on combining heterogeneous modalities or fine-tuning multimodal large language models. While these approaches have shown strong performance, they often rely on a single, generalized reasoning strategy, overlooking the unique characteristics of each modality ultimately limiting both accuracy and interpretability. To address the…
▽ More
Recent advances in multimodal question answering have primarily focused on combining heterogeneous modalities or fine-tuning multimodal large language models. While these approaches have shown strong performance, they often rely on a single, generalized reasoning strategy, overlooking the unique characteristics of each modality ultimately limiting both accuracy and interpretability. To address these limitations, we propose MAMMQA, a multi-agent QA framework for multimodal inputs spanning text, tables, and images. Our system includes two Visual Language Model (VLM) agents and one text-based Large Language Model (LLM) agent. The first VLM decomposes the user query into sub-questions and sequentially retrieves partial answers from each modality. The second VLM synthesizes and refines these results through cross-modal reasoning. Finally, the LLM integrates the insights into a cohesive answer. This modular design enhances interpretability by making the reasoning process transparent and allows each agent to operate within its domain of expertise. Experiments on diverse multimodal QA benchmarks demonstrate that our cooperative, multi-agent framework consistently outperforms existing baselines in both accuracy and robustness.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Biorderability of knot quandles of knots up to eight crossings
Authors:
Vaishnavi Gupta,
Hitesh Raundal
Abstract:
The paper investigates biorderability of knot quandles of prime knots up to eight crossings. We prove that knot quandles of knots $6_3$, $8_7$, $8_8$, $8_{10}$ and $8_{16}$ can not be biorderable. However, we see that knot quandles of knots $4_1$, $6_1$, $6_2$, $7_6$, $7_7$, $8_1$, $8_2$, $8_3$, $8_4$, $8_5$, $8_6$, $8_9$, $8_{11}$, $8_{12}$, $8_{13}$, $8_{14}$, $8_{17}$, $8_{18}$, $8_{20}$ and…
▽ More
The paper investigates biorderability of knot quandles of prime knots up to eight crossings. We prove that knot quandles of knots $6_3$, $8_7$, $8_8$, $8_{10}$ and $8_{16}$ can not be biorderable. However, we see that knot quandles of knots $4_1$, $6_1$, $6_2$, $7_6$, $7_7$, $8_1$, $8_2$, $8_3$, $8_4$, $8_5$, $8_6$, $8_9$, $8_{11}$, $8_{12}$, $8_{13}$, $8_{14}$, $8_{17}$, $8_{18}$, $8_{20}$ and $8_{21}$ could be biorderable. We also give linear orders on the generating set of the knot quandle of a knot (among these knots) that could be extendable to biorders on the quandle.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Is Architectural Complexity Overrated? Competitive and Interpretable Knowledge Graph Completion with RelatE
Authors:
Abhijit Chakraborty,
Chahana Dahal,
Ashutosh Balasubramaniam,
Tejas Anvekar,
Vivek Gupta
Abstract:
We revisit the efficacy of simple, real-valued embedding models for knowledge graph completion and introduce RelatE, an interpretable and modular method that efficiently integrates dual representations for entities and relations. RelatE employs a real-valued phase-modulus decomposition, leveraging sinusoidal phase alignments to encode relational patterns such as symmetry, inversion, and compositio…
▽ More
We revisit the efficacy of simple, real-valued embedding models for knowledge graph completion and introduce RelatE, an interpretable and modular method that efficiently integrates dual representations for entities and relations. RelatE employs a real-valued phase-modulus decomposition, leveraging sinusoidal phase alignments to encode relational patterns such as symmetry, inversion, and composition. In contrast to recent approaches based on complex-valued embeddings or deep neural architectures, RelatE preserves architectural simplicity while achieving competitive or superior performance on standard benchmarks. Empirically, RelatE outperforms prior methods across several datasets: on YAGO3-10, it achieves an MRR of 0.521 and Hit@10 of 0.680, surpassing all baselines. Additionally, RelatE offers significant efficiency gains, reducing training time by 24%, inference latency by 31%, and peak GPU memory usage by 22% compared to RotatE. Perturbation studies demonstrate improved robustness, with MRR degradation reduced by up to 61% relative to TransE and by up to 19% compared to RotatE under structural edits such as edge removals and relation swaps. Formal analysis further establishes the model's full expressiveness and its capacity to represent essential first-order logical inference patterns. These results position RelatE as a scalable and interpretable alternative to more complex architectures for knowledge graph completion.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Weaver: Interweaving SQL and LLM for Table Reasoning
Authors:
Rohit Khoja,
Devanshu Gupta,
Yanjie Fu,
Dan Roth,
Vivek Gupta
Abstract:
Querying tables with unstructured data is challenging due to the presence of text (or image), either embedded in the table or in external paragraphs, which traditional SQL struggles to process, especially for tasks requiring semantic reasoning. While Large Language Models (LLMs) excel at understanding context, they face limitations with long input sequences. Existing approaches that combine SQL an…
▽ More
Querying tables with unstructured data is challenging due to the presence of text (or image), either embedded in the table or in external paragraphs, which traditional SQL struggles to process, especially for tasks requiring semantic reasoning. While Large Language Models (LLMs) excel at understanding context, they face limitations with long input sequences. Existing approaches that combine SQL and LLMs typically rely on rigid, predefined work-flows, limiting their adaptability to complex queries. To address these issues, we introduce Weaver , a modular pipeline that dynamically integrates SQL and LLMs for table-based question answering (TableQA). Weaver generates a flexible, step-by-step plan that combines SQL for structured data retrieval with LLMs for semantic processing. By decomposing complex queries into manageable subtasks, Weaver improves accuracy and generalization. Our experiments show that Weaver consistently outperforms state-of-the-art methods across four TableQA datasets, reducing both API calls and error rates.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Federated Retrieval-Augmented Generation: A Systematic Mapping Study
Authors:
Abhijit Chakraborty,
Chahana Dahal,
Vivek Gupta
Abstract:
Federated Retrieval-Augmented Generation (Federated RAG) combines Federated Learning (FL), which enables distributed model training without exposing raw data, with Retrieval-Augmented Generation (RAG), which improves the factual accuracy of language models by grounding outputs in external knowledge. As large language models are increasingly deployed in privacy-sensitive domains such as healthcare,…
▽ More
Federated Retrieval-Augmented Generation (Federated RAG) combines Federated Learning (FL), which enables distributed model training without exposing raw data, with Retrieval-Augmented Generation (RAG), which improves the factual accuracy of language models by grounding outputs in external knowledge. As large language models are increasingly deployed in privacy-sensitive domains such as healthcare, finance, and personalized assistance, Federated RAG offers a promising framework for secure, knowledge-intensive natural language processing (NLP). To the best of our knowledge, this paper presents the first systematic mapping study of Federated RAG, covering literature published between 2020 and 2025. Following Kitchenham's guidelines for evidence-based software engineering, we develop a structured classification of research focuses, contribution types, and application domains. We analyze architectural patterns, temporal trends, and key challenges, including privacy-preserving retrieval, cross-client heterogeneity, and evaluation limitations. Our findings synthesize a rapidly evolving body of research, identify recurring design patterns, and surface open questions, providing a foundation for future work at the intersection of RAG and federated systems.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification
Authors:
Poojah Ganesan,
Rajat Aayush Jha,
Dan Roth,
Vivek Gupta
Abstract:
Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issu…
▽ More
Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issues, we introduce UNJOIN, a two-stage framework that decouples the retrieval of schema elements from SQL logic generation. In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name. This allows the model to focus purely on accurate retrieval without being distracted by the need to write complex SQL logic. In the second stage, the SQL query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic. Evaluations on SPIDER and BIRD datasets show that UNJOIN matches or exceeds the state-of-the-art baselines. UNJOIN uses only schema information, which does not require data access or fine-tuning, making it scalable and adaptable across databases.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
CRAFT: Training-Free Cascaded Retrieval for Tabular QA
Authors:
Adarsh Singh,
Kushal Raj Bhandari,
Jianxi Gao,
Soham Dan,
Vivek Gupta
Abstract:
Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries. Traditional dense retrieval models, such as DTR and ColBERT, not only incur high computational costs for large-scale retrieval tasks but also require retraining or fine-tuning on new datasets, limiting their adaptability to evolving domains and knowledge. In this work, we prop…
▽ More
Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries. Traditional dense retrieval models, such as DTR and ColBERT, not only incur high computational costs for large-scale retrieval tasks but also require retraining or fine-tuning on new datasets, limiting their adaptability to evolving domains and knowledge. In this work, we propose $\textbf{CRAFT}$, a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables before applying more computationally expensive dense models and neural re-rankers. Our approach achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers. We further enhance table representations by generating table descriptions and titles using Gemini Flash 1.5. End-to-end TQA results using various Large Language Models (LLMs) on NQ-Tables, a subset of the Natural Questions Dataset, demonstrate $\textbf{CRAFT}$ effectiveness.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Stable Real-Space Invariants and Topology Beyond Symmetry Indicators
Authors:
Yoonseok Hwang,
Vaibhav Gupta,
Frank Schindler,
Luis Elcoro,
Zhida Song,
B. Andrei Bernevig,
Barry Bradlyn
Abstract:
We show that certain band gaps, which appear topologically trivial from the perspective of symmetry indicators (SIs), must instead be topological, as guaranteed by real-space information that follows from Topological Quantum Chemistry (TQC). To address this, we introduce stable real-space invariants (SRSIs) that generalize the previously discovered local and composite real-space invariants to glob…
▽ More
We show that certain band gaps, which appear topologically trivial from the perspective of symmetry indicators (SIs), must instead be topological, as guaranteed by real-space information that follows from Topological Quantum Chemistry (TQC). To address this, we introduce stable real-space invariants (SRSIs) that generalize the previously discovered local and composite real-space invariants to global topological invariants of a given set of bands. These are linear combinations of Wannier state multiplicities at Wyckoff positions and take the form of $\mathbb{Z}$- and $\mathbb{Z}_n$-valued quantities ($n=2,4$). We enumerate all $\mathbb{Z}$SRSIs and $\mathbb{Z}_n$SRSIs in all non-magnetic space groups (SGs) with and without spin-orbit coupling. SRSIs fully diagnose the stable equivalence of atomic insulators, ensuring that two atomic insulators with matching SRSIs are adiabatically deformable to one another in the presence of auxiliary trivial bands. For both atomic and topological bands, $\mathbb{Z}$SRSIs are determined by the momentum-space symmetry data and thus determine the SIs. $\mathbb{Z}_n$SRSIs provide additional information about trivial band structures not captured by momentum-space data. While split elementary band representations (EBRs), where the bands forming an EBR split into disconnected parts, must induce band topology, there are 211 cases across 51 SGs where the momentum-space data of an EBR decomposes linearly with positive integer coefficients into those of other EBRs. We demonstrate that $\mathbb{Z}_n$SRSIs successfully identify the band topology in the majority of these split EBR cases, diagnosing all but 8 cases in 5 SGs. Our results solidify the conceptual framework of TQC as containing, but going beyond, SIs and momentum-space symmetry data.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control
Authors:
Shivam Bajaj,
Prateek Jaiswal,
Vijay Gupta
Abstract:
``Sim2real gap", in which the system learned in simulations is not the exact representation of the real system, can lead to loss of stability and performance when controllers learned using data from the simulated system are used on the real system. In this work, we address this challenge in the linear quadratic regulator (LQR) setting. Specifically, we consider an LQR problem for a system with unk…
▽ More
``Sim2real gap", in which the system learned in simulations is not the exact representation of the real system, can lead to loss of stability and performance when controllers learned using data from the simulated system are used on the real system. In this work, we address this challenge in the linear quadratic regulator (LQR) setting. Specifically, we consider an LQR problem for a system with unknown system matrices. Along with the state-action pairs from the system to be controlled, a trajectory of length $S$ of state-action pairs from a different unknown system is available. Our proposed algorithm is constructed upon Thompson sampling and utilizes the mean as well as the uncertainty of the dynamics of the system from which the trajectory of length $S$ is obtained. We establish that the algorithm achieves $\tilde{\mathcal{O}}({f(S,M_δ)\sqrt{T/S}})$ Bayes regret after $T$ time steps, where $M_δ$ characterizes the \emph{dissimilarity} between the two systems and $f(S,M_δ)$ is a function of $S$ and $M_δ$. When $M_δ$ is sufficiently small, the proposed algorithm achieves $\tilde{\mathcal{O}}({\sqrt{T/S}})$ Bayes regret and outperforms a naive strategy which does not utilize the available trajectory.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Authors:
Yusen Zhang,
Wenliang Zheng,
Aashrith Madasu,
Peng Shi,
Ryo Kamoi,
Hao Zhou,
Zhuoyang Zou,
Shu Zhao,
Sarkar Snigdha Sarathi Das,
Vipul Gupta,
Xiaoxin Lu,
Nan Zhang,
Ranran Haoran Zhang,
Avitej Iyer,
Renze Lou,
Wenpeng Yin,
Rui Zhang
Abstract:
High-resolution image (HRI) understanding aims to process images with a large number of pixels, such as pathological images and agricultural aerial images, both of which can exceed 1 million pixels. Vision Large Language Models (VLMs) can allegedly handle HRIs, however, there is a lack of a comprehensive benchmark for VLMs to evaluate HRI understanding. To address this gap, we introduce HRScene, a…
▽ More
High-resolution image (HRI) understanding aims to process images with a large number of pixels, such as pathological images and agricultural aerial images, both of which can exceed 1 million pixels. Vision Large Language Models (VLMs) can allegedly handle HRIs, however, there is a lack of a comprehensive benchmark for VLMs to evaluate HRI understanding. To address this gap, we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from 1,024 $\times$ 1,024 to 35,503 $\times$ 26,627. HRScene is collected and re-annotated by 10 graduate-level annotators, covering 25 scenarios, ranging from microscopic to radiology images, street views, long-range pictures, and telescope images. It includes HRIs of real-world objects, scanned documents, and composite multi-image. The two diagnostic evaluation datasets are synthesized by combining the target image with the gold answer and distracting images in different orders, assessing how well models utilize regions in HRI. We conduct extensive experiments involving 28 VLMs, including Gemini 2.0 Flash and GPT-4o. Experiments on HRScene show that current VLMs achieve an average accuracy of around 50% on real-world tasks, revealing significant gaps in HRI understanding. Results on synthetic datasets reveal that VLMs struggle to effectively utilize HRI regions, showing significant Regional Divergence and lost-in-middle, shedding light on future research.
△ Less
Submitted 29 April, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
A generalized fundamental solution technique for the regularized 13-moment system in rarefied gas flows
Authors:
Himanshi,
Lambert Theisen,
Anirudh Singh Rana,
Manuel Torrilhon,
Vinay Kumar Gupta
Abstract:
In this work, we explore the method of fundamental solutions (MFS) for solving the regularized 13-moment (R13) equations for rarefied monatomic gases. While previous applications of the MFS in rarefied gas flows relied on problem-specific fundamental solutions, we propose a generic approach that systematically computes the fundamental solutions for any linear moment system without predefined sourc…
▽ More
In this work, we explore the method of fundamental solutions (MFS) for solving the regularized 13-moment (R13) equations for rarefied monatomic gases. While previous applications of the MFS in rarefied gas flows relied on problem-specific fundamental solutions, we propose a generic approach that systematically computes the fundamental solutions for any linear moment system without predefined source terms. The generalized framework is first introduced using a simple example involving the Stokes equations, and is then extended to the R13 equations. The results obtained from the generic MFS are validated against an analytical solution for the R13 equations. Following validation, the framework is applied to the case of thermally-induced flow between two non-coaxial cylinders. Since no analytical solution exists for this case, we compare the results obtained from the MFS with those obtained from the finite element method (FEM). To further assess computational efficiency, we analyze the runtimes of the FEM and MFS. The results indicate that the MFS converges faster than the FEM and serves as a promising alternative to conventional meshing-based techniques.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Multilingual Performance Biases of Large Language Models in Education
Authors:
Vansh Gupta,
Sankalan Pal Chowdhury,
Vilém Zouhar,
Donya Rooein,
Mrinmaya Sachan
Abstract:
Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in non-English languages is warranted. We evaluated the performance of popular LLMs on four educational tasks: identifying student misconceptions, providing…
▽ More
Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in non-English languages is warranted. We evaluated the performance of popular LLMs on four educational tasks: identifying student misconceptions, providing targeted feedback, interactive tutoring, and grading translations in six languages (Hindi, Arabic, Farsi, Telugu, Ukrainian, Czech) in addition to English. We find that the performance on these tasks somewhat corresponds to the amount of language represented in training data, with lower-resource languages having poorer task performance. Although the models perform reasonably well in most languages, the frequent performance drop from English is significant. Thus, we recommend that practitioners first verify that the LLM works well in the target language for their educational task before deployment.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Evaluating energy inefficiency in energy-poor households in India: A frontier analysis approach
Authors:
Vallary Gupta,
Ahana Sarkar,
Chirag Deb,
Arnab Jana
Abstract:
Energy-poor households often compromise their thermal comfort and refrain from operating mechanical cooling devices to avoid high electricity bills. This is compounded by certain behavioral practices like retention of older, less efficient appliances, resulting in missed energy savings. Thus, the need to enhance efficiency becomes critical in these households. However, due to a lack of comprehensi…
▽ More
Energy-poor households often compromise their thermal comfort and refrain from operating mechanical cooling devices to avoid high electricity bills. This is compounded by certain behavioral practices like retention of older, less efficient appliances, resulting in missed energy savings. Thus, the need to enhance efficiency becomes critical in these households. However, due to a lack of comprehensive data in India, little is understood about their electricity consumption patterns and usage efficiency. Estimating inefficiency and assessing its determinants is crucial for improving their quality of life. This study measures the inefficiency in electricity consumption due to household practices and appliances in social housing in Mumbai, India. It considers technological determinants in addition to socio-economic variables. The study employs primary data collected from rehabilitation housing and slums in Mumbai. Stochastic frontier analysis, a parametric approach, is applied to estimate indicators of electricity consumption and inefficiency. While household size and workforce participation significantly affect consumption behavior in rehabilitation housing, it is limited to the workforce in slums. The ownership of appliances, except for washing machines in slums, also exhibits considerable impacts. The mean efficiency scores of 83% and 91% for rehabilitation housing and slums, respectively, empirically quantify the potential savings achievable. Factors that positively influence inefficiency include the duration of operating refrigerators, washing machines, iron, and AC. These results hold implications for enhancing the uptake of efficient appliances in addition to accelerating energy efficiency retrofits in the region. Policies should focus on awareness and the development of appliance markets through incentives.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Online Model Order Reduction of Linear Systems via $(γ,δ)$-Similarity
Authors:
Shivam Bajaj,
Carolyn L. Beck,
Vijay Gupta
Abstract:
Model order reduction aims to determine a low-order approximation of high-order models with least possible approximation errors. For application to physical systems, it is crucial that the reduced order model (ROM) is robust to any disturbance that acts on the full order model (FOM) -- in the sense that the output of the ROM remains a good approximation of that of the FOM, even in the presence of…
▽ More
Model order reduction aims to determine a low-order approximation of high-order models with least possible approximation errors. For application to physical systems, it is crucial that the reduced order model (ROM) is robust to any disturbance that acts on the full order model (FOM) -- in the sense that the output of the ROM remains a good approximation of that of the FOM, even in the presence of such disturbances. In this work, we present a framework for online model order reduction for a class of continuous-time linear systems that ensures this property for any $\mathcal{L}_2$ disturbance. Apart from robustness to disturbances in this sense, the proposed framework also displays other desirable properties for model order reduction: (1) a provable bound on the error defined as the $L_2$ norm of the difference between the output of the ROM and FOM, (2) preservation of stability, (3) compositionality properties and a provable error bound for arbitrary interconnected systems, (4) a provable bound on the output of the FOM when the controller designed for the ROM is used with the FOM, and finally, (5) compatibility with existing approaches such as balanced truncation and moment matching. Property (4) does not require computation of any gap metric and property (5) is beneficial as existing approaches can also be equipped with some of the preceding properties. The theoretical results are corroborated on numerical case studies, including on a building model.
△ Less
Submitted 2 May, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
agriFrame: Agricultural framework to remotely control a rover inside a greenhouse environment
Authors:
Saail Narvekar,
Soofiyan Atar,
Vishal Gupta,
Lohit Penubaku,
Kavi Arya
Abstract:
The growing demand for innovation in agriculture is essential for food security worldwide and more implicit in developing countries. With growing demand comes a reduction in rapid development time. Data collection and analysis are essential in agriculture. However, considering a given crop, its cycle comes once a year, and researchers must wait a few months before collecting more data for the give…
▽ More
The growing demand for innovation in agriculture is essential for food security worldwide and more implicit in developing countries. With growing demand comes a reduction in rapid development time. Data collection and analysis are essential in agriculture. However, considering a given crop, its cycle comes once a year, and researchers must wait a few months before collecting more data for the given crop. To overcome this hurdle, researchers are venturing into digital twins for agriculture. Toward this effort, we present an agricultural framework(agriFrame). Here, we introduce a simulated greenhouse environment for testing and controlling a robot and remotely controlling/implementing the algorithms in the real-world greenhouse setup. This work showcases the importance/interdependence of network setup, remotely controllable rover, and messaging protocol. The sophisticated yet simple-to-use agriFrame has been optimized for the simulator on minimal laptop/desktop specifications.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Collaborative Prediction: Tractable Information Aggregation via Agreement
Authors:
Natalie Collina,
Ira Globus-Harris,
Surbhi Goel,
Varun Gupta,
Aaron Roth,
Mirah Shi
Abstract:
We give efficient "collaboration protocols" through which two parties, who observe different features about the same instances, can interact to arrive at predictions that are more accurate than either could have obtained on their own. The parties only need to iteratively share and update their own label predictions-without either party ever having to share the actual features that they observe. Ou…
▽ More
We give efficient "collaboration protocols" through which two parties, who observe different features about the same instances, can interact to arrive at predictions that are more accurate than either could have obtained on their own. The parties only need to iteratively share and update their own label predictions-without either party ever having to share the actual features that they observe. Our protocols are efficient reductions to the problem of learning on each party's feature space alone, and so can be used even in settings in which each party's feature space is illegible to the other-which arises in models of human/AI interaction and in multi-modal learning. The communication requirements of our protocols are independent of the dimensionality of the data. In an online adversarial setting we show how to give regret bounds on the predictions that the parties arrive at with respect to a class of benchmark policies defined on the joint feature space of the two parties, despite the fact that neither party has access to this joint feature space. We also give simpler algorithms for the same task in the batch setting in which we assume that there is a fixed but unknown data distribution. We generalize our protocols to a decision theoretic setting with high dimensional outcome spaces, where parties communicate only "best response actions."
Our theorems give a computationally and statistically tractable generalization of past work on information aggregation amongst Bayesians who share a common and correct prior, as part of a literature studying "agreement" in the style of Aumann's agreement theorem. Our results require no knowledge of (or even the existence of) a prior distribution and are computationally efficient. Nevertheless we show how to lift our theorems back to this classical Bayesian setting, and in doing so, give new information aggregation theorems for Bayesian agreement.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
A Self-Supervised Learning Approach with Differentiable Optimization for UAV Trajectory Planning
Authors:
Yufei Jiang,
Yuanzhu Zhan,
Harsh Vardhan Gupta,
Chinmay Borde,
Junyi Geng
Abstract:
While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End-to-end learning approaches s…
▽ More
While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End-to-end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large-scale datasets, face significant sim-to-real gaps, or lack dynamical feasibility. In this paper, we propose a self-supervised UAV trajectory planning pipeline that integrates a learning-based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we incorporate a neural network-based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning-based perception with reliable physics-based optimization for improved generalizability and interpretability. Both simulation and real-world experiments validate our approach across various environments, demonstrating its effectiveness and robustness. Our method achieves a 31.33% improvement in position tracking error and 49.37% reduction in control effort compared to the state-of-the-art.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure
Authors:
The AIBrix Team,
Jiaxin Shan,
Varun Gupta,
Le Xu,
Haiyang Shi,
Jingyuan Zhang,
Ning Wang,
Linhui Xu,
Rong Kang,
Tongping Liu,
Yifei Zhang,
Yiqing Zhu,
Shuowei Jin,
Gangmuk Lim,
Binbin Chen,
Zuzhi Chen,
Xiao Liu,
Xin Chen,
Kante Yin,
Chak-Pong Chung,
Chenyu Jiang,
Yicheng Lu,
Jianjun Chen,
Caixue Lin,
Wu Xiang
, et al. (2 additional authors not shown)
Abstract:
We introduce AIBrix, a cloud-native, open-source framework designed to optimize and simplify large-scale LLM deployment in cloud environments. Unlike traditional cloud-native stacks, AIBrix follows a co-design philosophy, ensuring every layer of the infrastructure is purpose-built for seamless integration with inference engines like vLLM. AIBrix introduces several key innovations to reduce inferen…
▽ More
We introduce AIBrix, a cloud-native, open-source framework designed to optimize and simplify large-scale LLM deployment in cloud environments. Unlike traditional cloud-native stacks, AIBrix follows a co-design philosophy, ensuring every layer of the infrastructure is purpose-built for seamless integration with inference engines like vLLM. AIBrix introduces several key innovations to reduce inference costs and enhance performance including high-density LoRA management for dynamic adapter scheduling, LLM-specific autoscalers, and prefix-aware, load-aware routing. To further improve efficiency, AIBrix incorporates a distributed KV cache, boosting token reuse across nodes, leading to a 50% increase in throughput and a 70% reduction in inference latency. AIBrix also supports unified AI runtime which streamlines model management while maintaining vendor-agnostic engine compatibility. For large-scale multi-node inference, AIBrix employs hybrid orchestration -- leveraging Kubernetes for coarse-grained scheduling and Ray for fine-grained execution -- to balance efficiency and flexibility. Additionally, an SLO-driven GPU optimizer dynamically adjusts resource allocations, optimizing heterogeneous serving to maximize cost efficiency while maintaining service guarantees. Finally, AIBrix enhances system reliability with AI accelerator diagnostic tools, enabling automated failure detection and mock-up testing to improve fault resilience. AIBrix is available at https://github.com/vllm-project/aibrix.
△ Less
Submitted 22 February, 2025;
originally announced April 2025.
-
Leveraging LLM For Synchronizing Information Across Multilingual Tables
Authors:
Siddharth Khincha,
Tushar Kataria,
Ankita Anand,
Dan Roth,
Vivek Gupta
Abstract:
The vast amount of online information today poses challenges for non-English speakers, as much of it is concentrated in high-resource languages such as English and French. Wikipedia reflects this imbalance, with content in low-resource languages frequently outdated or incomplete. Recent research has sought to improve cross-language synchronization of Wikipedia tables using rule-based methods. Thes…
▽ More
The vast amount of online information today poses challenges for non-English speakers, as much of it is concentrated in high-resource languages such as English and French. Wikipedia reflects this imbalance, with content in low-resource languages frequently outdated or incomplete. Recent research has sought to improve cross-language synchronization of Wikipedia tables using rule-based methods. These approaches can be effective, but they struggle with complexity and generalization. This paper explores large language models (LLMs) for multilingual information synchronization, using zero-shot prompting as a scalable solution. We introduce the Information Updation dataset, simulating the real-world process of updating outdated Wikipedia tables, and evaluate LLM performance. Our findings reveal that single-prompt approaches often produce suboptimal results, prompting us to introduce a task decomposition strategy that enhances coherence and accuracy. Our proposed method outperforms existing baselines, particularly in Information Updation (1.79%) and Information Addition (20.58%), highlighting the model strength in dynamically updating and enriching data across architectures.
△ Less
Submitted 4 April, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
Authors:
Abhilash Shankarampeta,
Harsh Mahajan,
Tushar Kataria,
Dan Roth,
Vivek Gupta
Abstract:
Humans continuously make new discoveries, and understanding temporal sequence of events leading to these breakthroughs is essential for advancing science and society. This ability to reason over time allows us to identify future steps and understand the effects of financial and political decisions on our lives. However, large language models (LLMs) are typically trained on static datasets, limitin…
▽ More
Humans continuously make new discoveries, and understanding temporal sequence of events leading to these breakthroughs is essential for advancing science and society. This ability to reason over time allows us to identify future steps and understand the effects of financial and political decisions on our lives. However, large language models (LLMs) are typically trained on static datasets, limiting their ability to perform effective temporal reasoning. To assess the temporal reasoning capabilities of LLMs, we present the TRANSIENTTABLES dataset, which comprises 3,971 questions derived from over 14,000 tables, spanning 1,238 entities across multiple time periods. We introduce a template-based question-generation pipeline that harnesses LLMs to refine both templates and questions. Additionally, we establish baseline results using state-of-the-art LLMs to create a benchmark. We also introduce novel modeling strategies centered around task decomposition, enhancing LLM performance.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
European Contributions to Fermilab Accelerator Upgrades and Facilities for the DUNE Experiment
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The Proton Improvement Plan (PIP-II) to the FNAL accelerator chain and the Long-Baseline Neutrino Facility (LBNF) will provide the world's most intense neutrino beam to the Deep Underground Neutrino Experiment (DUNE) enabling a wide-ranging physics program. This document outlines the significant contributions made by European national laboratories and institutes towards realizing the first phase o…
▽ More
The Proton Improvement Plan (PIP-II) to the FNAL accelerator chain and the Long-Baseline Neutrino Facility (LBNF) will provide the world's most intense neutrino beam to the Deep Underground Neutrino Experiment (DUNE) enabling a wide-ranging physics program. This document outlines the significant contributions made by European national laboratories and institutes towards realizing the first phase of the project with a 1.2 MW neutrino beam. Construction of this first phase is well underway. For DUNE Phase II, this will be closely followed by an upgrade of the beam power to > 2 MW, for which the European groups again have a key role and which will require the continued support of the European community for machine aspects of neutrino physics. Beyond the neutrino beam aspects, LBNF is also responsible for providing unique infrastructure to install and operate the DUNE neutrino detectors at FNAL and at the Sanford Underground Research Facility (SURF). The cryostats for the first two Liquid Argon Time Projection Chamber detector modules at SURF, a contribution of CERN to LBNF, are central to the success of the ongoing execution of DUNE Phase I. Likewise, successful and timely procurement of cryostats for two additional detector modules at SURF will be critical to the success of DUNE Phase II and the overall physics program. The DUNE Collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This paper is being submitted to the 'Accelerator technologies' and 'Projects and Large Experiments' streams. Additional inputs related to the DUNE science program, DUNE detector technologies and R&D, and DUNE software and computing, are also being submitted to other streams.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
DUNE Software and Computing Research and Development
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The ambitious physics program of Phase I and Phase II of DUNE is dependent upon deployment and utilization of significant computing res…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The ambitious physics program of Phase I and Phase II of DUNE is dependent upon deployment and utilization of significant computing resources, and successful research and development of software (both infrastructure and algorithmic) in order to achieve these scientific goals. This submission discusses the computing resources projections, infrastructure support, and software development needed for DUNE during the coming decades as an input to the European Strategy for Particle Physics Update for 2026. The DUNE collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This submission to the 'Computing' stream focuses on DUNE software and computing. Additional inputs related to the DUNE science program, DUNE detector technologies and R&D, and European contributions to Fermilab accelerator upgrades and facilities for the DUNE experiment, are also being submitted to other streams.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
The DUNE Phase II Detectors
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the previous European Strategy for Particle Physics. The construction of DUNE Phase I is well underway. DUNE Phase II consists of a third and fourth far detector module, an upgraded near detector complex, and an enhanced > 2 MW beam. The fourth FD module is conceived as a 'Module of Opportunity', aimed at supporting the core DUNE science program while also expanding the physics opportunities with more advanced technologies. The DUNE collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This submission to the 'Detector instrumentation' stream focuses on technologies and R&D for the DUNE Phase II detectors. Additional inputs related to the DUNE science program, DUNE software and computing, and European contributions to Fermilab accelerator upgrades and facilities for the DUNE experiment, are also being submitted to other streams.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
The DUNE Science Program
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the previous European Strategy for Particle Physics. The construction of DUNE Phase I is well underway. DUNE Phase II consists of a third and fourth far detector module, an upgraded near detector complex, and an enhanced > 2 MW beam. The fourth FD module is conceived as a 'Module of Opportunity', aimed at supporting the core DUNE science program while also expanding the physics opportunities with more advanced technologies. The DUNE collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This submission to the 'Neutrinos and cosmic messengers', 'BSM physics' and 'Dark matter and dark sector' streams focuses on the physics program of DUNE. Additional inputs related to DUNE detector technologies and R&D, DUNE software and computing, and European contributions to Fermilab accelerator upgrades and facilities for the DUNE experiment, are also being submitted to other streams.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing
Authors:
Vishnu Asutosh Dasu,
Md Rafi ur Rashid,
Vipul Gupta,
Saeid Tizpaz-Niari,
Gang Tan
Abstract:
This paper explores pruning attention heads as a post-processing bias mitigation method for large language models (LLMs). Modern AI systems such as LLMs are expanding into sensitive social contexts where fairness concerns become especially crucial. Since LLMs develop decision-making patterns by training on massive datasets of human-generated content, they naturally encode and perpetuate societal b…
▽ More
This paper explores pruning attention heads as a post-processing bias mitigation method for large language models (LLMs). Modern AI systems such as LLMs are expanding into sensitive social contexts where fairness concerns become especially crucial. Since LLMs develop decision-making patterns by training on massive datasets of human-generated content, they naturally encode and perpetuate societal biases. While modifying training datasets and algorithms is expensive and requires significant resources; post-processing techniques-such as selectively deactivating neurons and attention heads in pre-trained LLMs-can provide feasible and effective approaches to improve fairness. However, identifying the optimal subset of parameters to prune presents a combinatorial challenge within LLMs' immense parameter space, requiring solutions that efficiently balance competing objectives across the frontiers of model fairness and utility.
To address the computational challenges, we explore a search-based program repair approach via randomized simulated annealing. Given the prohibitive evaluation costs in billion-parameter LLMs, we develop surrogate deep neural networks that efficiently model the relationship between attention head states (active/inactive) and their corresponding fairness/utility metrics. This allows us to perform optimization over the surrogate models and efficiently identify optimal subsets of attention heads for selective pruning rather than directly searching through the LLM parameter space. This paper introduces Attention Pruning, a fairness-aware surrogate simulated annealing approach to prune attention heads in LLMs that disproportionately contribute to bias while minimally impacting overall model utility. Our experiments show that Attention Pruning achieves up to $40\%$ reduction in gender bias and outperforms the state-of-the-art bias mitigation strategies.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Limits on the Ejecta Mass During the Search for Kilonovae Associated with Neutron Star-Black Hole Mergers: A case study of S230518h, GW230529, S230627c and the Low-Significance Candidate S240422ed
Authors:
M. Pillas,
S. Antier,
K. Ackley,
T. Ahumada,
D. Akl,
L. de Almeida,
S. Anand,
C. Andrade,
I. Andreoni,
K. A. Bostroem,
M. Bulla,
E. Burns,
T. Cabrera,
S. Chang,
H. Choi,
B. O'Connor,
M. W. Coughlin,
W. Corradi,
A. R. Gibbs,
T. Dietrich,
D. Dornic,
J. -G. Ducoin,
P. -A. Duverne,
M. Dyer,
H. -B. Eggenstein
, et al. (56 additional authors not shown)
Abstract:
Neutron star-black hole (NSBH) mergers, detectable via their gravitational-wave (GW) emission, are expected to produce kilonovae (KNe). Four NSBH candidates have been identified and followed-up by more than fifty instruments since the start of the fourth GW Observing Run (O4), in May 2023, up to July 2024; however, no confirmed associated KN has been detected. This study evaluates ejecta propertie…
▽ More
Neutron star-black hole (NSBH) mergers, detectable via their gravitational-wave (GW) emission, are expected to produce kilonovae (KNe). Four NSBH candidates have been identified and followed-up by more than fifty instruments since the start of the fourth GW Observing Run (O4), in May 2023, up to July 2024; however, no confirmed associated KN has been detected. This study evaluates ejecta properties from multi-messenger observations to understand the absence of detectable KN: we use GW public information and joint observations taken from 05.2023 to 07.2024 (LVK, ATLAS, DECam, GECKO, GOTO, GRANDMA, SAGUARO, TESS, WINTER, ZTF). First, our analysis on follow-up observation strategies shows that, on average, more than 50% of the simulated KNe associated with NSBH mergers reach their peak luminosity around one day after merger in the $g,r,i$- bands, which is not necessarily covered for each NSBH GW candidate. We also analyze the trade-off between observation efficiency and the intrinsic properties of the KN emission, to understand the impact on how these constraints affect our ability to detect the KN, and underlying ejecta properties for each GW candidate. In particular, we can only confirm the kilonova was not missed for 1% of the GW230529 and S230627c sky localization region, given the large sky localization error of GW230529 and the large distance for S230627c and, their respective KN faint luminosities. More constraining, for S230518h, we infer the dynamical ejecta and post-merger disk wind ejecta $m_{dyn}, m_{wind}$ $<$ $0.03$ $M_\odot$ and the viewing angle $θ>25^\circ$. Similarly, the non-astrophysical origin of S240422ed is likely further confirmed by the fact that we would have detected even a faint KN at the time and presumed distance of the S240422ed event candidate, within a minimum 45% credible region of the sky area, that can be larger depending on the KN scenario.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Measurement of the inhomogeneity of the KATRIN tritium source electric potential by high-resolution spectroscopy of conversion electrons from $^{83m}$Kr
Authors:
H. Acharya,
M. Aker,
D. Batzler,
A. Beglarian,
J. Beisenkötter,
M. Biassoni,
B. Bieringer,
Y. Biondi,
F. Block,
B. Bornschein,
L. Bornschein,
M. Böttcher,
M. Carminati,
A. Chatrabhuti,
S. Chilingaryan,
B. A. Daniel,
M. Descher,
D. Díaz Barrero,
O. Dragoun,
G. Drexlin,
F. Edzards,
K. Eitel,
E. Ellinger,
R. Engel,
S. Enomoto
, et al. (108 additional authors not shown)
Abstract:
Precision spectroscopy of the electron spectrum of the tritium $β$-decay near the kinematic endpoint is a direct method to determine the effective electron antineutrino mass. The KArlsruhe TRItium Neutrino (KATRIN) experiment aims to determine this quantity with a sensitivity of better than 0.3$\,$eV (90$\,$% C.L.). An inhomogeneous electric potential in the tritium source of KATRIN can lead to di…
▽ More
Precision spectroscopy of the electron spectrum of the tritium $β$-decay near the kinematic endpoint is a direct method to determine the effective electron antineutrino mass. The KArlsruhe TRItium Neutrino (KATRIN) experiment aims to determine this quantity with a sensitivity of better than 0.3$\,$eV (90$\,$% C.L.). An inhomogeneous electric potential in the tritium source of KATRIN can lead to distortions of the $β$-spectrum, which directly impact the neutrino-mass observable. This effect can be quantified through precision spectroscopy of the conversion-electrons of co-circulated metastable $^{83m}$Kr. Therefore, dedicated, several-weeks long measurement campaigns have been performed within the KATRIN data taking schedule. In this work, we infer the tritium source potential observables from these measurements, and present their implications for the neutrino-mass determination.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Laplace-Net: Learning Dynamical Systems with External Forcing
Authors:
Bernd Zimmering,
Cecília Coelho,
Vaibhav Gupta,
Maria Maleshkova,
Oliver Niggemann
Abstract:
Modelling forced dynamical systems - where an external input drives the system state - is critical across diverse domains such as engineering, finance, and the natural sciences. In this work, we propose Laplace-Net, a decoupled, solver-free neural framework for learning forced and delay-aware systems. It leverages a Laplace transform-based approach to decompose internal dynamics, external inputs,…
▽ More
Modelling forced dynamical systems - where an external input drives the system state - is critical across diverse domains such as engineering, finance, and the natural sciences. In this work, we propose Laplace-Net, a decoupled, solver-free neural framework for learning forced and delay-aware systems. It leverages a Laplace transform-based approach to decompose internal dynamics, external inputs, and initial values into established theoretical concepts, enhancing interpretability. Laplace-Net promotes transferability since the system can be rapidly re-trained or fine-tuned for new forcing signals, providing flexibility in applications ranging from controller adaptation to long-horizon forecasting. Experimental results on eight benchmark datasets - including linear, non-linear, and delayed systems - demonstrate the method's improved accuracy and robustness compared to state-of-the-art approaches, particularly in handling complex and previously unseen inputs.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
The discovery of a 41s radio pulsar PSR J0311+1402 with ASKAP
Authors:
Yuanming Wang,
Pavan Uttarkar,
Ryan Shannon,
Yu Wing Joshua Lee,
Dougal Dobie,
Ziteng Wang,
Keith Bannister,
Manisha Caleb,
Adam Deller,
Marcin Glowacki,
Joscha Jahns-Schindler,
Tara Murphy,
Reshma Anna-Thomas,
N. D. R. Bhat,
Xinping Deng,
Vivek Gupta,
Akhil Jaini,
Clancy James,
John Tuthill
Abstract:
The emerging population of long-period radio transients (LPTs) show both similarities and differences with normal pulsars. A key difference is that their radio emission is too bright to be powered solely by rotational energy. Various models have been proposed (including both white-dwarf or neutron star origins), and their nature remains uncertain. Known LPTs have minutes to hours long spin periods…
▽ More
The emerging population of long-period radio transients (LPTs) show both similarities and differences with normal pulsars. A key difference is that their radio emission is too bright to be powered solely by rotational energy. Various models have been proposed (including both white-dwarf or neutron star origins), and their nature remains uncertain. Known LPTs have minutes to hours long spin periods, while normal pulsars have periods ranging from milliseconds to seconds. Here, we report the discovery of PSR J0311+1402, an object with an intermediate spin period of 41 seconds, bridging the gap between LPTs and normal pulsars. PSR J0311+1402 exhibits low linear ($\sim25\%$) and circular polarisation ($\sim5\%$) and a relatively steep spectral index ($\sim-2.3$), features similar to normal pulsars. However, its observed spin-down properties place it below the pulsar death line, where pair production and thus radio emission are expected to cease. The discovery of PSR J0311+1402 suggests the existence of a previously undetected population within this intermediate period range, presumably missed due to selection biases in traditional pulsar search methods. Finding more such objects is important to fill the current gap in neutron star spin periods, improving our understanding of the relationships among rotation-powered pulsars and LPTs.
△ Less
Submitted 13 April, 2025; v1 submitted 10 March, 2025;
originally announced March 2025.
-
$κ$-symmetric M5 brane web for defects in $AdS_7 / CFT_6$ holography
Authors:
Varun Gupta
Abstract:
In this work, we will continue our analysis of some general probe M5 brane solutions from our previous work in $AdS_7 \times S^4$ spacetime (appeared in arxiv:2109.08551). These are codimension-2 in $AdS_7$ and preserve at least 2 supercharges when the worldvolume 3-form flux field strength is zero. We will turn on the field strength and find that the embedding conditions are modified, excluding c…
▽ More
In this work, we will continue our analysis of some general probe M5 brane solutions from our previous work in $AdS_7 \times S^4$ spacetime (appeared in arxiv:2109.08551). These are codimension-2 in $AdS_7$ and preserve at least 2 supercharges when the worldvolume 3-form flux field strength is zero. We will turn on the field strength and find that the embedding conditions are modified, excluding certain branes contained in the previous result. The new main result here is very general, so we pick simpler embedding conditions that describe highly symmetric examples that preserve half of the supersymmetry of the 11 dimensions. When the flux field is zero, worldvolumes have $AdS_5 \times S^1$ topology. We turn the flux field value non-zero in these examples and analyze how the shape of the worldvolume deforms as supersymmetry is broken by some additional fractions.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Examining the Mental Health Impact of Misinformation on Social Media Using a Hybrid Transformer-Based Approach
Authors:
Sarvesh Arora,
Sarthak Arora,
Deepika Kumar,
Vallari Agrawal,
Vedika Gupta,
Dipit Vasdev
Abstract:
Social media has significantly reshaped interpersonal communication, fostering connectivity while also enabling the proliferation of misinformation. The unchecked spread of false narratives has profound effects on mental health, contributing to increased stress, anxiety, and misinformation-driven paranoia. This study presents a hybrid transformer-based approach using a RoBERTa-LSTM classifier to d…
▽ More
Social media has significantly reshaped interpersonal communication, fostering connectivity while also enabling the proliferation of misinformation. The unchecked spread of false narratives has profound effects on mental health, contributing to increased stress, anxiety, and misinformation-driven paranoia. This study presents a hybrid transformer-based approach using a RoBERTa-LSTM classifier to detect misinformation, assess its impact on mental health, and classify disorders linked to misinformation exposure. The proposed models demonstrate accuracy rates of 98.4, 87.8, and 77.3 in detecting misinformation, mental health implications, and disorder classification, respectively. Furthermore, Pearson's Chi-Squared Test for Independence (p-value = 0.003871) validates the direct correlation between misinformation and deteriorating mental well-being. This study underscores the urgent need for better misinformation management strategies to mitigate its psychological repercussions. Future research could explore broader datasets incorporating linguistic, demographic, and cultural variables to deepen the understanding of misinformation-induced mental health distress.
△ Less
Submitted 7 June, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Improving Consistency in Large Language Models through Chain of Guidance
Authors:
Harsh Raj,
Vipul Gupta,
Domenic Rosati,
Subhabrata Majumdar
Abstract:
Consistency is a fundamental dimension of trustworthiness in Large Language Models (LLMs). For humans to be able to trust LLM-based applications, their outputs should be consistent when prompted with inputs that carry the same meaning or intent. Despite this need, there is no known mechanism to control and guide LLMs to be more consistent at inference time. In this paper, we introduce a novel alig…
▽ More
Consistency is a fundamental dimension of trustworthiness in Large Language Models (LLMs). For humans to be able to trust LLM-based applications, their outputs should be consistent when prompted with inputs that carry the same meaning or intent. Despite this need, there is no known mechanism to control and guide LLMs to be more consistent at inference time. In this paper, we introduce a novel alignment strategy to maximize semantic consistency in LLM outputs. Our proposal is based on Chain of Guidance (CoG), a multistep prompting technique that generates highly consistent outputs from LLMs. For closed-book question-answering (Q&A) tasks, when compared to direct prompting, the outputs generated using CoG show improved consistency. While other approaches like template-based responses and majority voting may offer alternative paths to consistency, our work focuses on exploring the potential of guided prompting. We use synthetic data sets comprised of consistent input-output pairs to fine-tune LLMs to produce consistent and correct outputs. Our fine-tuned models are more than twice as consistent compared to base models and show strong generalization capabilities by producing consistent outputs over datasets not used in the fine-tuning process.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.