-
Robust Reward Modeling via Causal Rubrics
Authors:
Pragya Srivastava,
Harman Singh,
Rahul Madhavan,
Gandharv Patil,
Sravanti Addepalli,
Arun Suggala,
Rengarajan Aravamudhan,
Soumya Sharma,
Anirban Laha,
Aravindan Raghuveer,
Karthikeyan Shanmugam,
Doina Precup
Abstract:
Reward models (RMs) are fundamental to aligning Large Language Models (LLMs) via human feedback, yet they often suffer from reward hacking. They tend to latch on to superficial or spurious attributes, such as response length or formatting, mistaking these cues learned from correlations in training data for the true causal drivers of quality (e.g., factuality, relevance). This occurs because standa…
▽ More
Reward models (RMs) are fundamental to aligning Large Language Models (LLMs) via human feedback, yet they often suffer from reward hacking. They tend to latch on to superficial or spurious attributes, such as response length or formatting, mistaking these cues learned from correlations in training data for the true causal drivers of quality (e.g., factuality, relevance). This occurs because standard training objectives struggle to disentangle these factors, leading to brittle RMs and misaligned policies. We introduce Crome (Causally Robust Reward Modeling), a novel framework grounded in an explicit causal model designed to mitigate reward hacking. Crome employs the following synthetic targeted augmentations during training: (1) Causal Augmentations, which are pairs that differ along specific causal attributes, to enforce sensitivity along each causal attribute individually, and (2) Neutral Augmentations, which are tie-label pairs varying primarily in spurious attributes, to enforce invariance along spurious attributes. Notably, our augmentations are produced without any knowledge of spurious factors, via answer interventions only along causal rubrics, that are identified by querying an oracle LLM. Empirically, Crome significantly outperforms standard baselines on RewardBench, improving average accuracy by up to 5.4% and achieving gains of up to 13.2% and 7.2% in specific categories. The robustness of Crome is further testified by the consistent gains obtained in a Best-of-N inference setting across increasing N, across various benchmarks, including the popular RewardBench (covering chat, chat-hard, safety, and reasoning tasks), the safety-focused WildGuardTest, and the reasoning-specific GSM8k.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception
Authors:
Pushyami Kaveti,
Ambjorn Grimsrud Waldum,
Hanumant Singh,
Martin Ludvigsen
Abstract:
Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) demand robust spatial perception capabilities, including Simultaneous Localization and Mapping (SLAM), to support both remote and autonomous tasks. Vision-based systems have been integral to these advancements, capturing rich color and texture at low cost while enabling semantic scene understanding. However, underwater con…
▽ More
Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) demand robust spatial perception capabilities, including Simultaneous Localization and Mapping (SLAM), to support both remote and autonomous tasks. Vision-based systems have been integral to these advancements, capturing rich color and texture at low cost while enabling semantic scene understanding. However, underwater conditions -- such as light attenuation, backscatter, and low contrast -- often degrade image quality to the point where traditional vision-based SLAM pipelines fail. Moreover, these pipelines typically rely on monocular or stereo inputs, limiting their scalability to the multi-camera configurations common on many vehicles. To address these issues, we propose to leverage multi-modal sensing that fuses data from multiple sensors-including cameras, inertial measurement units (IMUs), and acoustic devices-to enhance situational awareness and enable robust, real-time SLAM. We explore both geometric and learning-based techniques along with semantic analysis, and conduct experiments on the data collected from a work-class ROV during several field deployments in the Trondheim Fjord. Through our experimental results, we demonstrate the feasibility of real-time reliable state estimation and high-quality 3D reconstructions in visually challenging underwater conditions. We also discuss system constraints and identify open research questions, such as sensor calibration, limitations with learning-based methods, that merit further exploration to advance large-scale underwater operations.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift
Authors:
Harvineet Singh,
Fan Xia,
Alexej Gossmann,
Andrew Chuang,
Julian C. Hong,
Jean Feng
Abstract:
Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimiz…
▽ More
Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks "Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?" (Where?) and, if so, dives deeper to ask "Can we explain this using more detailed variable(subset)-specific shifts?" (How?). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Free Circle Actions on the Product of Three Spheres
Authors:
Dimpi,
Hemant Kumar Singh
Abstract:
The orbit spaces of free S^0-actions on the mod 2 cohomology product of three spheres, S^n x S^m x S^l, 1 <= n <= m <= l have been determined in [6]. In this paper, we extend these findings to free S^1-actions on the rational cohomology product of three spheres. This extension also builds upon the work of Dotzel et al. [7], who studied free circle actions on the rational cohomology product of two…
▽ More
The orbit spaces of free S^0-actions on the mod 2 cohomology product of three spheres, S^n x S^m x S^l, 1 <= n <= m <= l have been determined in [6]. In this paper, we extend these findings to free S^1-actions on the rational cohomology product of three spheres. This extension also builds upon the work of Dotzel et al. [7], who studied free circle actions on the rational cohomology product of two spheres. Additionally, we establish Borsuk-Ulam type theorems.
△ Less
Submitted 20 June, 2025; v1 submitted 28 May, 2025;
originally announced May 2025.
-
Ginsparg-Wilson Hamiltonians with Improved Chiral Symmetry
Authors:
Hersh Singh
Abstract:
We construct a family of Ginsparg-Wilson Hamiltonians with improved chiral properties, starting from a construction of Creutz-Horvath-Neuberger that provides a doubler-free Hamiltonian lattice regularization for Dirac fermions in even spacetime dimensions. We use a higher-order generalization of the Ginsparg-Wilson relation due to Fujikawa, which yields an order-$k$ Hamiltonian overlap operator fo…
▽ More
We construct a family of Ginsparg-Wilson Hamiltonians with improved chiral properties, starting from a construction of Creutz-Horvath-Neuberger that provides a doubler-free Hamiltonian lattice regularization for Dirac fermions in even spacetime dimensions. We use a higher-order generalization of the Ginsparg-Wilson relation due to Fujikawa, which yields an order-$k$ Hamiltonian overlap operator for each integer $k \geq 0$, with an exactly conserved but nonquantized chiral charge that becomes quantized as $k \to \infty$. Our construction provides physical insight into how Fujikawa's higher-order Ginsparg-Wilson relation improves chiral symmetry while reproducing the anomaly, highlighting the trade-offs inherent in any Hamiltonian lattice realization of an anomalous chiral symmetry. This class of Hamiltonian lattice regularizations, with their tunable chiral symmetry properties, offers potential advantages for quantum and tensor-network simulations.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Paired and Unpaired Image to Image Translation using Generative Adversarial Networks
Authors:
Gaurav Kumar,
Soham Satyadharma,
Harpreet Singh
Abstract:
Image to image translation is an active area of research in the field of computer vision, enabling the generation of new images with different styles, textures, or resolutions while preserving their characteristic properties. Recent architectures leverage Generative Adversarial Networks (GANs) to transform input images from one domain to another. In this work, we focus on the study of both paired…
▽ More
Image to image translation is an active area of research in the field of computer vision, enabling the generation of new images with different styles, textures, or resolutions while preserving their characteristic properties. Recent architectures leverage Generative Adversarial Networks (GANs) to transform input images from one domain to another. In this work, we focus on the study of both paired and unpaired image translation across multiple image domains. For the paired task, we used a conditional GAN model, and for the unpaired task, we trained it using cycle consistency loss. We experimented with different types of loss functions, multiple Patch-GAN sizes, and model architectures. New quantitative metrics - precision, recall, and FID score - were used for analysis. In addition, a qualitative study of the results of different experiments was conducted.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
A Surrogate Model for the Forward Design of Multi-layered Metasurface-based Radar Absorbing Structures
Authors:
Vineetha Joy,
Aditya Anand,
Nidhi,
Anshuman Kumar,
Amit Sethi,
Hema Singh
Abstract:
Metasurface-based radar absorbing structures (RAS) are highly preferred for applications like stealth technology, electromagnetic (EM) shielding, etc. due to their capability to achieve frequency selective absorption characteristics with minimal thickness and reduced weight penalty. However, the conventional approach for the EM design and optimization of these structures relies on forward simulati…
▽ More
Metasurface-based radar absorbing structures (RAS) are highly preferred for applications like stealth technology, electromagnetic (EM) shielding, etc. due to their capability to achieve frequency selective absorption characteristics with minimal thickness and reduced weight penalty. However, the conventional approach for the EM design and optimization of these structures relies on forward simulations, using full wave simulation tools, to predict the electromagnetic (EM) response of candidate meta atoms. This process is computationally intensive, extremely time consuming and requires exploration of large design spaces. To overcome this challenge, we propose a surrogate model that significantly accelerates the prediction of EM responses of multi-layered metasurface-based RAS. A convolutional neural network (CNN) based architecture with Huber loss function has been employed to estimate the reflection characteristics of the RAS model. The proposed model achieved a cosine similarity of 99.9% and a mean square error of 0.001 within 1000 epochs of training. The efficiency of the model has been established via full wave simulations as well as experiment where it demonstrated significant reduction in computational time while maintaining high predictive accuracy.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
First Measurement of the Total Compton Scattering Cross Section between 6 and 11 GeV
Authors:
GlueX Collaboration,
F. Afzal,
C. S. Akondi,
M. Albrecht,
M. Amaryan,
S. Arrigo,
V. Arroyave,
A. Asaturyan,
A. Austregesilo,
Z. Baldwin,
F. Barbosa,
J. Barlow,
E. Barriga,
R. Barsotti,
D. Barton,
V. Baturin,
V. V. Berdnikov,
T. Black,
W. Boeglin,
M. Boer,
W. J. Briscoe,
T. Britton,
R. Brunner,
S. Cao,
E. Chudakov
, et al. (126 additional authors not shown)
Abstract:
The total cross section for Compton scattering off atomic electrons, $γ+e\rightarrowγ'+e'$, was measured using photons with energies between 6.5 and 11.1~GeV incident on a $^9$Be target as part of the PrimEx-eta experiment in Hall D at Jefferson Lab. This is the first measurement of this fundamental QED process within this energy range. The total uncertainties of the cross section averaged to 3.4%…
▽ More
The total cross section for Compton scattering off atomic electrons, $γ+e\rightarrowγ'+e'$, was measured using photons with energies between 6.5 and 11.1~GeV incident on a $^9$Be target as part of the PrimEx-eta experiment in Hall D at Jefferson Lab. This is the first measurement of this fundamental QED process within this energy range. The total uncertainties of the cross section averaged to 3.4% across all energy bins. This not only demonstrates the capability of this experimental setup to perform precision cross-section measurements at forward angles but also allows us to compare with state-of-the-art QED calculations.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Kernel Dynamic Mode Decomposition For Sparse Reconstruction of Closable Koopman Operators
Authors:
Nishant Panda,
Himanshu Singh,
J. Nathan Kutz
Abstract:
Spatial temporal reconstruction of dynamical system is indeed a crucial problem with diverse applications ranging from climate modeling to numerous chaotic and physical processes. These reconstructions are based on the harmonious relationship between the Koopman operators and the choice of dictionary, determined implicitly by a kernel function. This leads to the approximation of the Koopman operat…
▽ More
Spatial temporal reconstruction of dynamical system is indeed a crucial problem with diverse applications ranging from climate modeling to numerous chaotic and physical processes. These reconstructions are based on the harmonious relationship between the Koopman operators and the choice of dictionary, determined implicitly by a kernel function. This leads to the approximation of the Koopman operators in a reproducing kernel Hilbert space (RKHS) associated with that kernel function. Data-driven analysis of Koopman operators demands that Koopman operators be closable over the underlying RKHS, which still remains an unsettled, unexplored, and critical operator-theoretic challenge. We aim to address this challenge by investigating the embedding of the Laplacian kernel in the measure-theoretic sense, giving rise to a rich enough RKHS to settle the closability of the Koopman operators. We leverage Kernel Extended Dynamic Mode Decomposition with the Laplacian kernel to reconstruct the dominant spatial temporal modes of various diverse dynamical systems. After empirical demonstration, we concrete such results by providing the theoretical justification leveraging the closability of the Koopman operators on the RKHS generated by the Laplacian kernel on the avenues of Koopman mode decomposition and the Koopman spectral measure. Such results were explored from both grounds of operator theory and data-driven science, thus making the Laplacian kernel a robust choice for spatial-temporal reconstruction.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Exact islands scenario for CFT systems and critical ratios in higher geometry
Authors:
Harvendra Singh
Abstract:
We study $CFT_d$ systems which are in contact with each other and symmetrically arranged. The system-B is treated as bath that surrounds system-A in the middle. Our focus is to learn how the entanglement entropy of a bath pair system changes as a function of its size. The total size of systems A and B taken together is kept fixed in this process. It is found that for strip shaped systems the bath…
▽ More
We study $CFT_d$ systems which are in contact with each other and symmetrically arranged. The system-B is treated as bath that surrounds system-A in the middle. Our focus is to learn how the entanglement entropy of a bath pair system changes as a function of its size. The total size of systems A and B taken together is kept fixed in this process. It is found that for strip shaped systems the bath entropy becomes maximum when respective system sizes follow Fibonacci type critical ratio condition. Beyond critical point when bath size increases the bath entropy starts decreasing, where
island and icebergs entropies play important role. Interestingly entire effect of icebergs can be resummed giving rise to 'exact island' scenario for $CFT_d$ with $d>2$. Post criticality we also find important identity involving entropy differences $S[B]-S[A]=S_l-S_{island}$ where island contribution is exact. The mutual information of far separated bath pair follows specific law $I(B:B) \propto {b^2\over (Distance)^d}$. It never vanishes for finite systems. Once system-A size approaches to Kaluza-Klein scale the bath entropy becomes discretized. In summary knowing island corrections is vital for large bath entanglement entropy.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Phase-shifting structured illumination with polarization-encoded metasurface
Authors:
Linzhi Yu,
Jesse Pietila,
Haobijam Johnson Singh,
Humeyra Caglayan
Abstract:
Phase-shifting structured illumination is a powerful technique used across diverse imaging modalities, including 3D surface measurement, quantitative phase imaging, and super-resolution microscopy. However, conventional implementations often rely on mechanically driven or optoelectronically complex systems, limiting their compactness, stability, and integration. Here, we present a polarization-con…
▽ More
Phase-shifting structured illumination is a powerful technique used across diverse imaging modalities, including 3D surface measurement, quantitative phase imaging, and super-resolution microscopy. However, conventional implementations often rely on mechanically driven or optoelectronically complex systems, limiting their compactness, stability, and integration. Here, we present a polarization-controlled dielectric metasurface that generates phase-shifting fringe patterns in the visible spectrum, enabling compact and robust structured light projection. The metasurface encodes distinct phase gratings for orthogonal polarizations, producing fringe patterns with relative lateral displacements that vary according to the polarization of the transmitted light. We experimentally demonstrate high-quality fringe generation and apply the structured illumination in a fringe projection profilometry system for 3D surface measurement of different objects. The metasurface integrates multiple phase-shifting steps into a single static device, offering a millimeter-scale footprint and compatibility with polarization multiplexing. This approach introduces a compact, passive solution for structured light generation with broad potential in next-generation optical metrology and advanced computational imaging.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Geolocating Earth Imagery from ISS: Integrating Machine Learning with Astronaut Photography for Enhanced Geographic Mapping
Authors:
Vedika Srivastava,
Hemant Kumar Singh,
Jaisal Singh
Abstract:
This paper presents a novel approach to geolocating images captured from the International Space Station (ISS) using advanced machine learning algorithms. Despite having precise ISS coordinates, the specific Earth locations depicted in astronaut-taken photographs often remain unidentified. Our research addresses this gap by employing three distinct image processing pipelines: a Neural Network base…
▽ More
This paper presents a novel approach to geolocating images captured from the International Space Station (ISS) using advanced machine learning algorithms. Despite having precise ISS coordinates, the specific Earth locations depicted in astronaut-taken photographs often remain unidentified. Our research addresses this gap by employing three distinct image processing pipelines: a Neural Network based approach, a SIFT based method, and GPT-4 model. Each pipeline is tailored to process high-resolution ISS imagery, identifying both natural and man-made geographical features. Through extensive evaluation on a diverse dataset of over 140 ISS images, our methods demonstrate significant promise in automated geolocation with varied levels of success. The NN approach showed a high success rate in accurately matching geographical features, while the SIFT pipeline excelled in processing zoomed-in images. GPT-4 model provided enriched geographical descriptions alongside location predictions. This research contributes to the fields of remote sensing and Earth observation by enhancing the accuracy and efficiency of geolocating space-based imagery, thereby aiding environmental monitoring and global mapping efforts.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Qubit-efficient quantum chemistry with the ADAPT variational quantum eigensolver and double unitary downfolding
Authors:
Harjeet Singh,
Luke W. Bertels,
Daniel Claudino,
Sophia E. Economou,
Edwin Barnes,
Nicholas P. Bauman,
Karol Kowalski,
Nicholas J. Mayhall
Abstract:
In this work, we combine the recently developed double unitary coupled cluster (DUCC) theory with the adaptive, problem-tailored variational quantum eigensolver (ADAPT-VQE) to explore accuracy of unitary downfolded Hamiltonians for quantum simulation of chemistry. We benchmark the ability of DUCC effective Hamiltonians to recover dynamical correlation energy outside of an active space. We consider…
▽ More
In this work, we combine the recently developed double unitary coupled cluster (DUCC) theory with the adaptive, problem-tailored variational quantum eigensolver (ADAPT-VQE) to explore accuracy of unitary downfolded Hamiltonians for quantum simulation of chemistry. We benchmark the ability of DUCC effective Hamiltonians to recover dynamical correlation energy outside of an active space. We consider the effects of strong correlation, commutator truncation, higher-body terms, and approximate external amplitudes on the accuracy of these effective Hamiltonians. When combining these DUCC Hamiltonians with ADAPT-VQE, we observe similar convergence of the ground state as compared to bare active space Hamiltonians, demonstrating that DUCC Hamiltonians provide increased accuracy without increasing the load on the quantum processor.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Causal-Copilot: An Autonomous Causal Analysis Agent
Authors:
Xinyue Wang,
Kun Zhou,
Wenyi Wu,
Har Simrat Singh,
Fang Nan,
Songyao Jin,
Aryan Philip,
Saloni Patnaik,
Hou Zhu,
Shivam Singh,
Parjanya Prashant,
Qian Shen,
Biwei Huang
Abstract:
Causal analysis plays a foundational role in scientific discovery and reliable decision-making, yet it remains largely inaccessible to domain experts due to its conceptual and algorithmic complexity. This disconnect between causal methodology and practical usability presents a dual challenge: domain experts are unable to leverage recent advances in causal learning, while causal researchers lack br…
▽ More
Causal analysis plays a foundational role in scientific discovery and reliable decision-making, yet it remains largely inaccessible to domain experts due to its conceptual and algorithmic complexity. This disconnect between causal methodology and practical usability presents a dual challenge: domain experts are unable to leverage recent advances in causal learning, while causal researchers lack broad, real-world deployment to test and refine their methods. To address this, we introduce Causal-Copilot, an autonomous agent that operationalizes expert-level causal analysis within a large language model framework. Causal-Copilot automates the full pipeline of causal analysis for both tabular and time-series data -- including causal discovery, causal inference, algorithm selection, hyperparameter optimization, result interpretation, and generation of actionable insights. It supports interactive refinement through natural language, lowering the barrier for non-specialists while preserving methodological rigor. By integrating over 20 state-of-the-art causal analysis techniques, our system fosters a virtuous cycle -- expanding access to advanced causal methods for domain experts while generating rich, real-world applications that inform and advance causal theory. Empirical evaluations demonstrate that Causal-Copilot achieves superior performance compared to existing baselines, offering a reliable, scalable, and extensible solution that bridges the gap between theoretical sophistication and real-world applicability in causal analysis. A live interactive demo of Causal-Copilot is available at https://causalcopilot.com/.
△ Less
Submitted 21 April, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
Physical Parameters of Stars in NGC 6397 Using ANN-Based Interpolation and Full Spectrum Fitting
Authors:
Nitesh Kumar,
Philippe Prugniel,
Harinder P. Singh
Abstract:
Stellar spectral interpolation is critical technique employed by fitting software to derive the physical parameters of stars. This approach is necessary because on-the-go generation of synthetic stellar spectra is not possible due to the complex and high cost of computation. The goal of this study is to develop a spectral interpolator for a synthetic spectral library using artificial neural networ…
▽ More
Stellar spectral interpolation is critical technique employed by fitting software to derive the physical parameters of stars. This approach is necessary because on-the-go generation of synthetic stellar spectra is not possible due to the complex and high cost of computation. The goal of this study is to develop a spectral interpolator for a synthetic spectral library using artificial neural networks (ANNs). The study aims to test the accuracy of the trained interpolator through self-inversion and, subsequently, to utilize the interpolator to derive the physical parameters of stars in the globular cluster NGC 6397 using spectra obtained from the Multi Unit Spectroscopic Explorer (MUSE) on the Very Large Telescope (VLT). In this study, ANNs were trained to function as spectral interpolators. The ULySS full-spectrum fitting package, integrated with the trained interpolators, was then used to extract the physical parameters of 1587 spectra of 1063 stars in NGC 6397. The trained ANN interpolator achieved precise determination of stellar parameters with a mean difference of 31 K for $T_{\rm eff}$ and 0.01 dex for [Fe/H] compared to previous studies. This study demonstrates the efficacy of ANN-based spectral interpolation in stellar parameter determination, offering faster and more accurate analysis.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Seismic constraints on the spin evolution of slowly rotating young intermediate-mass stars
Authors:
K. H. Singh,
S. K. Panda,
S. M. Hanasoge,
S. Dhanpal
Abstract:
$δ$ Scuti stars are hot, rapid rotators and are a poorly understood class of pulsators. Asteroseismology provides the only means with which to probe their interior dynamics. However, their complex and unexplained oscillation patterns restrict analyses to only a small fraction with interpretable pulsations. Here, we identify 5381 $δ…
▽ More
$δ$ Scuti stars are hot, rapid rotators and are a poorly understood class of pulsators. Asteroseismology provides the only means with which to probe their interior dynamics. However, their complex and unexplained oscillation patterns restrict analyses to only a small fraction with interpretable pulsations. Here, we identify 5381 $δ$ Scuti stars from 63 sectors of TESS observations, of which 300 had interpretable oscillations, with 24 showing rotational splittings. We inferred compositions and ages ($τ$) for the 300 stars finding them in near-ZAMS states (Bedding et al. 2020), and measured the mean envelope rotation rates ($< f_{rot} >$) for 24 of them. Analyzing their age-dependent rotation, we found these stars essentially exhibit weak-to-no spindown, while evolving past the ZAMS across a narrow time-span during which they show regular pulsations. A quantitative fit to their spin-evolution results in a trend $f_{rot} (d^{-1}) \propto (τ/{Gyr})^{-0.048 \pm 0.016}$, much slower than the spindown of cooler late-type stars due to magnetic braking (Skumanich's law: $f_{rot} (d^{-1}) \propto (τ/{Gyr})^{-0.5}$). Based on stellar evolution calculations, we show this weak spindown is consistent with the gradual increase in their moment-of-inertia.
△ Less
Submitted 5 May, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
Grade Guard: A Smart System for Short Answer Automated Grading
Authors:
Niharika Dadu,
Harsh Vardhan Singh,
Romi Banerjee
Abstract:
The advent of large language models (LLMs) in the education sector has provided impetus to automate grading short answer questions. LLMs make evaluating short answers very efficient, thus addressing issues like staff shortage. However, in the task of Automated Short Answer Grading (ASAG), LLM responses are influenced by diverse perspectives in their training dataset, leading to inaccuracies in eva…
▽ More
The advent of large language models (LLMs) in the education sector has provided impetus to automate grading short answer questions. LLMs make evaluating short answers very efficient, thus addressing issues like staff shortage. However, in the task of Automated Short Answer Grading (ASAG), LLM responses are influenced by diverse perspectives in their training dataset, leading to inaccuracies in evaluating nuanced or partially correct answers. To address this challenge, we propose a novel framework, Grade Guard.
1. To enhance the task-based specialization of the LLMs, the temperature parameter has been fine-tuned using Root Mean Square Error (RMSE).
2. Unlike traditional approaches, LLMs in Grade Guard compute an Indecisiveness Score (IS) along with the grade to reflect uncertainty in predicted grades.
3. Introduced Confidence-Aware Loss (CAL) to generate an optimized Indecisiveness Score (IS).
4. To improve reliability, self-reflection based on the optimized IS has been introduced into the framework, enabling human re-evaluation to minimize incorrect grade assignments.
Our experimentation shows that the best setting of Grade Guard outperforms traditional methods by 19.16% RMSE in Upstage Solar Pro, 23.64% RMSE in Upstage Solar Mini, 4.00% RMSE in Gemini 1.5 Flash, and 10.20% RMSE in GPT 4-o Mini. Future work includes improving interpretability by generating rationales for grades to enhance accuracy. Expanding benchmark datasets and annotating them with domain-specific nuances will enhance grading accuracy. Finally, analyzing feedback to enhance confidence in predicted grades, reduce biases, optimize grading criteria, and personalize learning while supporting multilingual grading systems will make the solution more accurate, adaptable, fair, and inclusive.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Giant Spin Pumping at Polymer/Ferromagnet Interfaces for Hybrid Spintronic Devices
Authors:
Shiva Gaur,
Akash Kumar,
Himanshu Bangar,
Utkarsh Shashank,
Hukum Singh,
Saroj P. Dash,
Anubhav Raghav,
Johan Åkerman
Abstract:
While the growing utilization of polymers in flexible electronic devices has sparked significant interest in polymer/metal interfaces, spintronic studies of such interfaces remain limited. Here, we systematically study spin pumping across a polymer/ferromagnet metal interface between hydrogen silsesquioxane (HSQ) oligomer layers ($t_\mathit{HSQ} = 30, 36, 48$ nm) and NiFe (…
▽ More
While the growing utilization of polymers in flexible electronic devices has sparked significant interest in polymer/metal interfaces, spintronic studies of such interfaces remain limited. Here, we systematically study spin pumping across a polymer/ferromagnet metal interface between hydrogen silsesquioxane (HSQ) oligomer layers ($t_\mathit{HSQ} = 30, 36, 48$ nm) and NiFe ($t_\mathit{NiFe} = 4, 5, 7, 10$ nm) thin films. Using ferromagnetic resonance measurements, we observe strong spin pumping (large linewidth broadening) and a giant spin mixing conductance, reaching 19.8~${\rm nm^{-2}}$ for HSQ = 48 nm, \emph{i.e.}~comparable to that of heavy metals. Our results suggest efficient spin transfer across the HSQ/NiFe interface, possibly originating from a combination of spin and orbital pumping, and provide valuable insights for designing self-powered and flexible spintronic devices utilizing polymers in combination with ferromagnetic materials.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
Gemma 3 Technical Report
Authors:
Gemma Team,
Aishwarya Kamath,
Johan Ferret,
Shreya Pathak,
Nino Vieillard,
Ramona Merhej,
Sarah Perrin,
Tatiana Matejovicova,
Alexandre Ramé,
Morgane Rivière,
Louis Rouillard,
Thomas Mesnard,
Geoffrey Cideron,
Jean-bastien Grill,
Sabela Ramos,
Edouard Yvinec,
Michelle Casbon,
Etienne Pot,
Ivo Penchev,
Gaël Liu,
Francesco Visin,
Kathleen Kenealy,
Lucas Beyer,
Xiaohai Zhai,
Anton Tsitsulin
, et al. (191 additional authors not shown)
Abstract:
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie…
▽ More
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Blockwise Optimization for Projective Variational Quantum Dynamics (BLOP-VQD): Algorithm and Implementation for Lattice Systems
Authors:
Harshdeep Singh,
Sonjoy Majumder,
Sabyashachi Mishra
Abstract:
We present an efficient approach to simulate real-time quantum dynamics using Projected Variational Quantum Dynamics (PVQD), where the computational cost is reduced by strategically optimizing only a subset of the variational parameters at each time step. Typically, the variational ansatz consists of repeated blocks of parameterized quantum circuits, where all parameters are updated in a standard…
▽ More
We present an efficient approach to simulate real-time quantum dynamics using Projected Variational Quantum Dynamics (PVQD), where the computational cost is reduced by strategically optimizing only a subset of the variational parameters at each time step. Typically, the variational ansatz consists of repeated blocks of parameterized quantum circuits, where all parameters are updated in a standard optimization procedure. In contrast, our method selectively optimizes one block at a time while keeping the others fixed, allowing for significant reductions in computational overhead. This semi-global optimization strategy ensures that all qubits are still involved in the evolution, but the optimization is localized to specific blocks, thus avoiding the need to update all parameters simultaneously. We propose different approaches for choosing the next block for optimization, including sequential, random, and fidelity-based updation. We demonstrate the performance of the proposed methods in a series of spin-lattice models with varying sizes and complexity. Our method preserves the accuracy of the time evolution with a much lower computational cost. This new optimization strategy provides a promising path toward high-fidelity simulation of the time evolution of complex quantum systems with reduced computational resources.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
Meta-operators for all-optical image processing
Authors:
Linzhi Yu,
Haobijam J. Singh,
Jesse Pietila,
Humeyra Caglayan
Abstract:
All-optical image processing offers a high-speed, energy-efficient alternative to conventional electronic systems by leveraging the wave nature of light for parallel computation. However, traditional optical processors rely on bulky components, limiting scalability and integration. Here, we demonstrate a compact metasurface-based platform for analog optical computing. By employing double-phase enc…
▽ More
All-optical image processing offers a high-speed, energy-efficient alternative to conventional electronic systems by leveraging the wave nature of light for parallel computation. However, traditional optical processors rely on bulky components, limiting scalability and integration. Here, we demonstrate a compact metasurface-based platform for analog optical computing. By employing double-phase encoding and polarization multiplexing, our approach enables arbitrary image transformations within a single passive nanophotonic device, eliminating the need for complex optical setups or digital post-processing. We experimentally showcase key computational operations, including first-order differentiation, cross-correlation, vertex detection, and Laplacian differentiation. Additionally, we extend this framework to high-resolution 3D holography, achieving subwavelength-scale volumetric wavefront control for depth-resolved reconstructions with high fidelity. Our results establish a scalable and versatile approach to computational optics, with applications including real-time image processing, energy-efficient computing, biomedical imaging, high-fidelity holographic displays, and optical data storage, driving the advancement of intelligent optical processors.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
DCAT: Dual Cross-Attention Fusion for Disease Classification in Radiological Images with Uncertainty Estimation
Authors:
Jutika Borah,
Hidam Kumarjit Singh
Abstract:
Accurate and reliable image classification is crucial in radiology, where diagnostic decisions significantly impact patient outcomes. Conventional deep learning models tend to produce overconfident predictions despite underlying uncertainties, potentially leading to misdiagnoses. Attention mechanisms have emerged as powerful tools in deep learning, enabling models to focus on relevant parts of the…
▽ More
Accurate and reliable image classification is crucial in radiology, where diagnostic decisions significantly impact patient outcomes. Conventional deep learning models tend to produce overconfident predictions despite underlying uncertainties, potentially leading to misdiagnoses. Attention mechanisms have emerged as powerful tools in deep learning, enabling models to focus on relevant parts of the input data. Combined with feature fusion, they can be effective in addressing uncertainty challenges. Cross-attention has become increasingly important in medical image analysis for capturing dependencies across features and modalities. This paper proposes a novel dual cross-attention fusion model for medical image analysis by addressing key challenges in feature integration and interpretability. Our approach introduces a bidirectional cross-attention mechanism with refined channel and spatial attention that dynamically fuses feature maps from EfficientNetB4 and ResNet34 leveraging multi-network contextual dependencies. The refined features through channel and spatial attention highlights discriminative patterns crucial for accurate classification. The proposed model achieved AUC of 99.75%, 100%, 99.93% and 98.69% and AUPR of 99.81%, 100%, 99.97%, and 96.36% on Covid-19, Tuberculosis, Pneumonia Chest X-ray images and Retinal OCT images respectively. The entropy values and several high uncertain samples give an interpretable visualization from the model enhancing transparency. By combining multi-scale feature extraction, bidirectional attention and uncertainty estimation, our proposed model strongly impacts medical image analysis.
△ Less
Submitted 19 March, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Mixed-state learnability transitions in monitored noisy quantum dynamics
Authors:
Hansveer Singh,
Romain Vasseur,
Andrew C. Potter,
Sarang Gopalakrishnan
Abstract:
We consider learnability transitions in monitored quantum systems that undergo noisy evolution, subject to a global strong symmetry -- i.e., in addition to the measuring apparatus, the system can interact with an unobserved environment, but does not exchange charge with it. As in the pure-state setting, we find two information-theoretic phases -- a sharp (fuzzy) phase in which an eavesdropper can…
▽ More
We consider learnability transitions in monitored quantum systems that undergo noisy evolution, subject to a global strong symmetry -- i.e., in addition to the measuring apparatus, the system can interact with an unobserved environment, but does not exchange charge with it. As in the pure-state setting, we find two information-theoretic phases -- a sharp (fuzzy) phase in which an eavesdropper can rapidly (slowly) learn the symmetry charge. However, because the dynamics is noisy, both phases can be simulated efficiently using tensor networks. Indeed, even when the true dynamics is unitary, introducing noise by hand allows an eavesdropper to efficiently learn the symmetry charge from local measurements, as we demonstrate. We identify the fuzzy phase in this setting as a mixed-state phase that exhibits spontaneous strong-to-weak symmetry breaking.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Generalized Ginsparg-Wilson relations: Fermionic anomalies on the lattice
Authors:
Hersh Singh
Abstract:
The Ginsparg-Wilson (GW) relation elegantly captures how the anomalous chiral symmetry of a Dirac fermion manifests on the lattice. In this talk, we discuss how the GW relation and its closed-form solution, the overlap operator, can be generalized to Majorana or Dirac fermions in any dimension for finite symmetry transformations (continuous or discrete). We find an exact symmetry which reproduces…
▽ More
The Ginsparg-Wilson (GW) relation elegantly captures how the anomalous chiral symmetry of a Dirac fermion manifests on the lattice. In this talk, we discuss how the GW relation and its closed-form solution, the overlap operator, can be generalized to Majorana or Dirac fermions in any dimension for finite symmetry transformations (continuous or discrete). We find an exact symmetry which reproduces both perturbative and global anomalies on the lattice. These generalized GW fermions are boundary theories of various bulk symmetry-protected topological phases and thus provide an explicit lattice realization of the fermionic bulk-boundary correspondence central to recent proposals for chiral gauge theories on the lattice.
△ Less
Submitted 29 March, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
Static Program Analysis Guided LLM Based Unit Test Generation
Authors:
Sujoy Roychowdhury,
Giriprasad Sridhara,
A K Raghavan,
Joy Bose,
Sourav Mazumdar,
Hamender Singh,
Srinivasan Bajji Sugumaran,
Ricardo Britto
Abstract:
We describe a novel approach to automating unit test generation for Java methods using large language models (LLMs). Existing LLM-based approaches rely on sample usage(s) of the method to test (focal method) and/or provide the entire class of the focal method as input prompt and context. The former approach is often not viable due to the lack of sample usages, especially for newly written focal me…
▽ More
We describe a novel approach to automating unit test generation for Java methods using large language models (LLMs). Existing LLM-based approaches rely on sample usage(s) of the method to test (focal method) and/or provide the entire class of the focal method as input prompt and context. The former approach is often not viable due to the lack of sample usages, especially for newly written focal methods. The latter approach does not scale well enough; the bigger the complexity of the focal method and larger associated class, the harder it is to produce adequate test code (due to factors such as exceeding the prompt and context lengths of the underlying LLM). We show that augmenting prompts with \emph{concise} and \emph{precise} context information obtained by program analysis %of the focal method increases the effectiveness of generating unit test code through LLMs. We validate our approach on a large commercial Java project and a popular open-source Java project.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Optically Detected Magnetic Resonance Imaging and Sensing Within Functionalized Additively Manufactured Microporous Structures
Authors:
Brian W. Blankenship,
Yoonsoo Rho,
Zachary Jones,
Timon Meier,
Runxuan Li,
Emanuel Druga,
Harpreet Singh,
Xiaoxing Xia,
Ashok Ajoy,
Costas P. Grigoropoulos
Abstract:
Quantum sensing with nitrogen-vacancy centers in diamond has emerged as a powerful tool for measuring diverse physical parameters, yet the versatility of these measurement approaches is often limited by the achievable layout and dimensionality of bulk-crystal platforms. Here, we demonstrate a versatile approach to creating designer quantum sensors by surface-functionalizing multiphoton lithography…
▽ More
Quantum sensing with nitrogen-vacancy centers in diamond has emerged as a powerful tool for measuring diverse physical parameters, yet the versatility of these measurement approaches is often limited by the achievable layout and dimensionality of bulk-crystal platforms. Here, we demonstrate a versatile approach to creating designer quantum sensors by surface-functionalizing multiphoton lithography microstructures with NV-containing nanodiamonds. We showcase this capability by fabricating a 150 $μ$m x 150 $μ$m x 150 $μ$m triply periodic minimal surface gyroid structure with millions of attached nanodiamonds. We demonstrate a means to volumetrically image these structures using a refractive index matching confocal imaging technique, and extract ODMR spectra from 1.86 $μ$m x 1.86 $μ$m areas of highly concentrated nanodiamonds across a cross section of the gyroid. Furthermore, the high density of sensing elements enables ensemble temperature measurements with sensitivity of 0.548 °K/$\sqrt{Hz}$ at 5 mW excitation power. This approach to creating quantum-enabled microarchitectures opens new possibilities for multimodal sensing in complex three-dimensional environments.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Even denominator fractional quantum Hall states in the zeroth Landau level of monolayer-like band of ABA trilayer graphene
Authors:
Tanima Chanda,
Simrandeep Kaur,
Harsimran Singh,
Kenji Watanabe,
Takashi Taniguchi,
Manish Jain,
Udit Khanna,
Ajit C. Balram,
Aveek Bid
Abstract:
The fractional quantum Hall (FQH) effect is a macroscopic manifestation of strong electron-electron interactions. Even denominator FQH states (FQHSs) at half-filling are particularly interesting as they are predicted to host non-Abelian excitations with non-trivial braiding statistics. Such states are predominantly observed in the $N=1$ Landau level (LL) of semiconductors such as GaAs. In this Let…
▽ More
The fractional quantum Hall (FQH) effect is a macroscopic manifestation of strong electron-electron interactions. Even denominator FQH states (FQHSs) at half-filling are particularly interesting as they are predicted to host non-Abelian excitations with non-trivial braiding statistics. Such states are predominantly observed in the $N=1$ Landau level (LL) of semiconductors such as GaAs. In this Letter, we report the unanticipated observation of even-denominator FQHSs in the $N=0$ LL of ABA trilayer graphene (TLG), a system characterized by tunable LL mixing and the absence of inversion symmetry. Notably, we find robust FQHSs at $ν=5/2$ and $ν=7/2$ when two LLs, originating from a monolayer-like band of TLG with different isospin indices, cross each other. These are flanked by the Levin-Halperin daughter states at $ν=7/13$ and $ν=9/17$, respectively, and further away, the standard series of Jain-sequence of composite fermions (CFs) is observed. The even-denominator FQHSs and their accompanying daughter states become stronger with increasing magnetic fields, while concomitantly, a weakening of the CF states is observed. We posit that the absence of inversion symmetry in the system gives rise to additional isospin interactions, which enhance LL mixing and soften the short-range part of the Coulomb repulsion, stabilizing the even-denominator FQHSs. In addition, we demonstrate that these states, along with their daughter states, can be finely tuned with an external displacement field that serves as an important tool to control the LL mixing in the system.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Multifunctional meta-optic azimuthal shear interferometer
Authors:
Linzhi Yu,
Sergei Shevtsov,
Haobijam Johnson Singh,
Peter G. Kazansky,
Humeyra Caglayan
Abstract:
Azimuthal shear interferometry is a versatile tool for analyzing wavefront asymmetries. However, conventional systems are bulky, alignment-sensitive, and prone to nonuniform shear. We present a broadband, compact, and robust meta-optics-based azimuthal shear interferometer in a common-path configuration, reducing the system size to the millimeter scale. Unlike conventional designs, the meta-optic…
▽ More
Azimuthal shear interferometry is a versatile tool for analyzing wavefront asymmetries. However, conventional systems are bulky, alignment-sensitive, and prone to nonuniform shear. We present a broadband, compact, and robust meta-optics-based azimuthal shear interferometer in a common-path configuration, reducing the system size to the millimeter scale. Unlike conventional designs, the meta-optic azimuthal shear interferometer utilizes the localized wavefront modulation capabilities of meta-optics to achieve uniform azimuthal shear displacement independent of radial position, significantly enhancing accuracy and stability. Our approach eliminates the need for bulky optical components and precise multi-path alignment, making it more resilient to environmental disturbances. Its multifunctionality is demonstrated through applications in all-optical edge detection, differential interference contrast microscopy, and aberrated wavefront sensing. These results underscore its potential for real-time analog image processing, advanced optical imaging, and optical testing.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Innovative Framework for Early Estimation of Mental Disorder Scores to Enable Timely Interventions
Authors:
Himanshi Singh,
Sadhana Tiwari,
Sonali Agarwal,
Ritesh Chandra,
Sanjay Kumar Sonbhadra,
Vrijendra Singh
Abstract:
Individual's general well-being is greatly impacted by mental health conditions including depression and Post-Traumatic Stress Disorder (PTSD), underscoring the importance of early detection and precise diagnosis in order to facilitate prompt clinical intervention. An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper. Utiliz…
▽ More
Individual's general well-being is greatly impacted by mental health conditions including depression and Post-Traumatic Stress Disorder (PTSD), underscoring the importance of early detection and precise diagnosis in order to facilitate prompt clinical intervention. An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper. Utilizing textual and audio data from clinical interview datasets, the method combines features taken from both modalities by combining the architectures of LSTM (Long Short Term Memory) and BiLSTM (Bidirectional Long Short-Term Memory).Although text features focus on speech's semantic and grammatical components; audio features capture vocal traits including rhythm, tone, and pitch. This combination of modalities enhances the model's capacity to identify minute patterns connected to mental health conditions. Using test datasets, the proposed method achieves classification accuracies of 92% for depression and 93% for PTSD, outperforming traditional unimodal approaches and demonstrating its accuracy and robustness.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Multimodal Data-Driven Classification of Mental Disorders: A Comprehensive Approach to Diagnosing Depression, Anxiety, and Schizophrenia
Authors:
Himanshi Singh,
Sadhana Tiwari,
Sonali Agarwal,
Ritesh Chandra,
Sanjay Kumar Sonbhadra,
Vrijendra Singh
Abstract:
This study investigates the potential of multimodal data integration, which combines electroencephalogram (EEG) data with sociodemographic characteristics like age, sex, education, and intelligence quotient (IQ), to diagnose mental diseases like schizophrenia, depression, and anxiety. Using Apache Spark and convolutional neural networks (CNNs), a data-driven classification pipeline has been develo…
▽ More
This study investigates the potential of multimodal data integration, which combines electroencephalogram (EEG) data with sociodemographic characteristics like age, sex, education, and intelligence quotient (IQ), to diagnose mental diseases like schizophrenia, depression, and anxiety. Using Apache Spark and convolutional neural networks (CNNs), a data-driven classification pipeline has been developed for big data environment to effectively analyze massive datasets. In order to evaluate brain activity and connection patterns associated with mental disorders, EEG parameters such as power spectral density (PSD) and coherence are examined. The importance of coherence features is highlighted by comparative analysis, which shows significant improvement in classification accuracy and robustness. This study emphasizes the significance of holistic approaches for efficient diagnostic tools by integrating a variety of data sources. The findings open the door for creative, data-driven approaches to treating psychiatric diseases by demonstrating the potential of utilizing big data, sophisticated deep learning methods, and multimodal datasets to enhance the precision, usability, and comprehension of mental health diagnostics.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Multi-target Range, Doppler and Angle estimation in MIMO-FMCW Radar with Limited Measurements
Authors:
Chandrashekhar Rai,
Himali Singh,
Arpan Chattopadhyay
Abstract:
Multiple-input multiple-output (MIMO) radar offers several performance and flexibility advantages over traditional radar arrays. However, high angular and Doppler resolutions necessitate a large number of antenna elements and the transmission of numerous chirps, leading to increased hardware and computational complexity. While compressive sensing (CS) has recently been applied to pulsed-waveform r…
▽ More
Multiple-input multiple-output (MIMO) radar offers several performance and flexibility advantages over traditional radar arrays. However, high angular and Doppler resolutions necessitate a large number of antenna elements and the transmission of numerous chirps, leading to increased hardware and computational complexity. While compressive sensing (CS) has recently been applied to pulsed-waveform radars with sparse measurements, its application to frequency-modulated continuous wave (FMCW) radar for target detection remains largely unexplored. In this paper, we propose a novel CS-based multi-target localization algorithm in the range, Doppler, and angular domains for MIMO-FMCW radar, where we jointly estimate targets' velocities and angles of arrival. To this end, we present a signal model for sparse-random and uniform linear arrays based on three-dimensional spectral estimation. For range estimation, we propose a discrete Fourier transform (DFT)-based focusing and orthogonal matching pursuit (OMP)-based techniques, each with distinct advantages, while two-dimensional CS is used for joint Doppler-angle estimation. Leveraging the properties of structured random matrices, we establish theoretical uniform and non-uniform recovery guarantees with high probability for the proposed framework. Our numerical experiments demonstrate that our methods achieve similar detection performance and higher resolution compared to conventional DFT and MUSIC with fewer transmitted chirps and antenna elements.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Convolutions with Radio-Frequency Spin-Diodes
Authors:
Erwann Plouet,
Hanuman Singh,
Pankaj Sethi,
Frank A. Mizrahi,
Dedalo Sanz-Hernandez,
Julie Grollier
Abstract:
The classification of radio-frequency (RF) signals is crucial for applications in robotics, traffic control, and medical devices. Spintronic devices, which respond to RF signals via ferromagnetic resonance, offer a promising solution. Recent studies have shown that a neural network of nanoscale magnetic tunnel junctions can classify RF signals without digitization. However, the complexity of these…
▽ More
The classification of radio-frequency (RF) signals is crucial for applications in robotics, traffic control, and medical devices. Spintronic devices, which respond to RF signals via ferromagnetic resonance, offer a promising solution. Recent studies have shown that a neural network of nanoscale magnetic tunnel junctions can classify RF signals without digitization. However, the complexity of these junctions poses challenges for rapid scaling. In this work, we demonstrate that simple spintronic devices, known as metallic spin-diodes, can effectively perform RF classification. These devices consist of NiFe/Pt bilayers and can implement weighted sums of RF inputs. We experimentally show that chains of four spin-diodes can execute 2x2 pixel filters, achieving high-quality convolutions on the Fashion-MNIST dataset. Integrating the hardware spin-diodes in a software network, we achieve a top-1 accuracy of 88 \% on the first 100 images, compared to 88.4 \% for full software with noise, and 90 \% without noise.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
A theoretical framework for BL Her stars IV. New period-luminosity relations in the Rubin-LSST filters
Authors:
Susmita Das,
László Molnár,
Róbert Szabó,
Harinder P. Singh,
Shashi M. Kanbur,
Anupam Bhardwaj,
Marcella Marconi,
Radoslaw Smolec
Abstract:
We present new theoretical light curves in the Rubin-LSST filters for a fine grid of BL Her models computed using MESA-RSP. We also derive new theoretical period-luminosity (PL) and period-Wesenheit (PW) relations in the Rubin-LSST filters with the goal to study the effect of convection parameters and metallicity on these relations. The grid of BL Her models was computed with the input stellar par…
▽ More
We present new theoretical light curves in the Rubin-LSST filters for a fine grid of BL Her models computed using MESA-RSP. We also derive new theoretical period-luminosity (PL) and period-Wesenheit (PW) relations in the Rubin-LSST filters with the goal to study the effect of convection parameters and metallicity on these relations. The grid of BL Her models was computed with the input stellar parameters: metallicity ($-2.0\; \mathrm{dex} \leq \mathrm{[Fe/H]} \leq 0.0\; \mathrm{dex}$), stellar mass ($0.5M_{\odot}-0.8M_{\odot}$), stellar luminosity ($50L_{\odot}-300L_{\odot}$), and effective temperature (across the full extent of the instability strip; in steps of 50K) and using four sets of convection parameters. Bolometric correction tables from MIST were used to transform the theoretical bolometric light curves of the BL Her models into the Rubin-LSST ugrizy filters. The PL relations of the BL Her models exhibit steeper slopes but smaller dispersion with increasing wavelengths in the Rubin-LSST filters. The PL and PW slopes for the complete set of BL Her models computed with radiative cooling (sets B and D) are statistically similar across the grizy filters. The BL Her models exhibit weak or negligible effect of metallicity on the PL relations for wavelengths longer than the g filter for both the cases of the complete set of models as well as the low-mass models. However, we find significant effect of metallicity on the PL relation in the u filter. Strong metallicity effects are observed in the PWZ relations involving the u filter and are found to have significant contribution from the high-metallicity BL Her models. Due to negligible metallicity effect for relations involving the Wesenheit indices $W(i,g-i)$, $W(z,i-z)$ and $W(y,g-y)$, we recommend these filter combinations for BL Her stars when observed with the Rubin-LSST to be used as reliable standard candles.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Enhancing AI Safety Through the Fusion of Low Rank Adapters
Authors:
Satya Swaroop Gudipudi,
Sreeram Vipparla,
Harpreet Singh,
Shashwat Goel,
Ponnurangam Kumaraguru
Abstract:
Instruction fine-tuning of large language models (LLMs) is a powerful method for improving task-specific performance, but it can inadvertently lead to a phenomenon where models generate harmful responses when faced with malicious prompts. In this paper, we explore Low-Rank Adapter Fusion (LoRA) as a means to mitigate these risks while preserving the model's ability to handle diverse instructions e…
▽ More
Instruction fine-tuning of large language models (LLMs) is a powerful method for improving task-specific performance, but it can inadvertently lead to a phenomenon where models generate harmful responses when faced with malicious prompts. In this paper, we explore Low-Rank Adapter Fusion (LoRA) as a means to mitigate these risks while preserving the model's ability to handle diverse instructions effectively. Through an extensive comparative analysis against established baselines using recognized benchmark datasets, we demonstrate a 42\% reduction in the harmfulness rate by leveraging LoRA fusion between a task adapter and a safety adapter, the latter of which is specifically trained on our safety dataset. However, we also observe exaggerated safety behaviour, where the model rejects safe prompts that closely resemble unsafe ones
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
Large spin accumulation signals in ultrafast magneto-optical experiments
Authors:
Alberto Anadón,
Harjinder Singh,
Eva Díaz,
Yann Le-Guen,
Julius Hohlfeld,
Richard B. Wilson,
Gregory Malinowski,
Michel Hehn,
Jon Gorchon
Abstract:
Magneto-optical techniques have become essential tools in spintronics, enabling the investigation of spin dynamics in the ultrafast regime. A key challenge in this field has been to accurately isolate the contributions to magneto-optical signals of spin transport phenomena from the local magnetization dynamics. The contribution of transported and accumulated spins was long believed to be orders of…
▽ More
Magneto-optical techniques have become essential tools in spintronics, enabling the investigation of spin dynamics in the ultrafast regime. A key challenge in this field has been to accurately isolate the contributions to magneto-optical signals of spin transport phenomena from the local magnetization dynamics. The contribution of transported and accumulated spins was long believed to be orders of magnitude smaller than that of the magnetization and thus previous approaches to disentangle these signals have relied on specific experimental designs, usually including thick metal layers. Here, we present experimental evidence demonstrating that the magneto-optical signal from ultrafast spin accumulations can, under certain conditions, be comparable to or even exceed that of the magnetic layer in a standard ultrafast demagnetization experiment. Our findings provide a new framework for accessing and isolating these spin accumulations, allowing for time and depth dependent probing of transported spin and/or orbital angular momentum.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
On semicommutativity of rings relative to hypercenter
Authors:
Nazeer Ansari,
Kh. Herachandra singh
Abstract:
Armendariz and semicommutative rings are generalizations of reduced rings. In \cite{IN}, I.N. Herstein introduced the notion of a hypercenter of a ring to generalize the center subclass. For a ring $R$, an element $a \in R$ is called hypercentral if $ax^{n}=x^{n}a$ for all $x \in R$ and for some $n=n(x,a) \in \mathbb{N}$. Motivated by this definition, we introduce $\mathscr{H}$-Semicommutative rin…
▽ More
Armendariz and semicommutative rings are generalizations of reduced rings. In \cite{IN}, I.N. Herstein introduced the notion of a hypercenter of a ring to generalize the center subclass. For a ring $R$, an element $a \in R$ is called hypercentral if $ax^{n}=x^{n}a$ for all $x \in R$ and for some $n=n(x,a) \in \mathbb{N}$. Motivated by this definition, we introduce $\mathscr{H}$-Semicommutative rings as a generalization of semicommutative rings and investigate their relations with other classes of rings. We have proven that the class of $\mathscr{H}$-Semicommutative rings lies strictly between Zero-Insertive rings (ZI) and Abelian rings. Additionally, we have demonstrated that if $R$ is $\mathscr{H}$-semicommutative, then for any $n \in \mathbb{N}$, the matrix subring $S_{n}^{'}(R)$ is also $\mathscr{H}$-semicommutative. Among other significant results, we have established that if $R$ is $\mathscr{H}$-semicommutative and left $SF$, then $R$ is strongly regular. We have also shown that $\mathscr{H}$-semicommutative rings are 2-primal, providing sufficient conditions for a ring $R$ to be nil-singular. Additionally, we have proven that if every simple singular module over $R$ is wnil-injective and $R$ is $\mathscr{H}$-semicommutative, then $R$ is reduced. Furthermore, we have studied the relationship of $\mathscr{H}$-semicommutative rings with the classes of Baer, Quasi-Baer, p.p. rings, and p.q. rings in this article, and we have provided some more relevant results.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
Dual Photonics Probing of Nano- to Submicron-Scale Structural Alterations in Human Brain Tissues or Cells and Chromatin or DNA with the Progression of Alzheimers Disease
Authors:
Fatemah Alharthi,
Ishmael Apachigawo,
Dhruvil Solanki,
Sazzad Khan,
Himanshi Singh,
Mohammad Moshahid Khan,
Prabhakar Pradhan
Abstract:
Understanding alterations in structural disorders in tissue or cells or building blocks, such as DNA or chromatin in the human brain, at the nano to submicron level provides us with efficient biomarkers for Alzheimers detection. Here, we report a dual photonics technique to detect nano- to submicron-scale alterations in brain tissues or cells and DNA or chromatin due to the early to late progressi…
▽ More
Understanding alterations in structural disorders in tissue or cells or building blocks, such as DNA or chromatin in the human brain, at the nano to submicron level provides us with efficient biomarkers for Alzheimers detection. Here, we report a dual photonics technique to detect nano- to submicron-scale alterations in brain tissues or cells and DNA or chromatin due to the early to late progression of Alzheimers disease in humans. Using a recently developed mesoscopic light transport technique, fine-focused nano-sensitive partial wave spectroscopy (PWS), we measure the degree of structural disorder in tissues. Furthermore, the chemical-specific inverse participation ratio technique (IPR) was used to measure the DNA or chromatin structural alterations. The results of the PWS and IPR experiments showed a significant increase in the degree of structural disorder at the nano to submicron scale at different stages of AD relative to their controls for both the tissue or cell and DNA cellular levels. The increase in the structural disorder in cells or tissues and DNA or chromatin in the nuclei can be attributed to higher mass density fluctuations in the tissue and DNA or chromatin damage in the nuclei caused by the rearrangements of macromolecules due to the deposition of the amyloid beta protein and damage in DNA or chromatin with the progress of AD.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Real-Time Simulation of Asymmetry Generation in Fermion-Bubble Collisions
Authors:
Marcela Carena,
Ying-Ying Li,
Tong Ou,
Hersh Singh
Abstract:
We perform real-time simulation of fermion-bubble scattering during a first order phase transition by which the fermions become massive. This out-of-equilibrium dynamics can generate a CP asymmetry, which is a crucial ingredient for baryon asymmetry generation in the early Universe. As a prototype, we consider a 1+1-dimensional system, for which CP is replaced by charge conjugation C. We use tenso…
▽ More
We perform real-time simulation of fermion-bubble scattering during a first order phase transition by which the fermions become massive. This out-of-equilibrium dynamics can generate a CP asymmetry, which is a crucial ingredient for baryon asymmetry generation in the early Universe. As a prototype, we consider a 1+1-dimensional system, for which CP is replaced by charge conjugation C. We use tensor network methods to study the C asymmetry generation outside the bubble wall induced by a complex fermion mass profile. In the asymptotic region, where reflected particles are far from the scattering point, our lattice calculations are in good agreement with perturbative calculations, but are also applicable in the nonpertubative regime. Real-time evolution of the instantaneous asymmetry generation near the collision point is also accessible within our framework and can be an order of magnitude larger than the asymptotic value. This intriguing feature may have far-reaching consequences in a full model calculation of electroweak baryogenesis. Our studies provide a necessary step for guiding quantum simulations of early universe phase transitions.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
A theoretical framework for BL Her stars III. A case study: Robust light curve optimisation in the LMC
Authors:
Susmita Das,
László Molnár,
Gábor B. Kovács,
Radoslaw Smolec,
Meridith Joyce,
Shashi M. Kanbur,
Tamás Szklenár,
Anupam Bhardwaj,
Harinder P. Singh,
Marcella Marconi,
Vincenzo Ripepi
Abstract:
We carry out an extensive light curve comparison of BL Her stars using observations from Gaia DR3 and stellar pulsation models computed using MESA-RSP with the goal to obtain the best-matched modeled-observed pairs for BL Her stars in the LMC. We use the Fourier decomposition technique to analyse the light curves in the G band obtained from Gaia DR3 and from MESA-RSP and use a robust light curve f…
▽ More
We carry out an extensive light curve comparison of BL Her stars using observations from Gaia DR3 and stellar pulsation models computed using MESA-RSP with the goal to obtain the best-matched modeled-observed pairs for BL Her stars in the LMC. We use the Fourier decomposition technique to analyse the light curves in the G band obtained from Gaia DR3 and from MESA-RSP and use a robust light curve fitting approach to score the modeled-observed pairs with respect to their pulsation periods and over their Fourier parameter space. We obtain the best-fit models for 48 BL Her stars in the LMC and thereby provide the stellar parameter estimates of these stars, 30 of which are labelled as the gold sample with superior light curve fits. We find a relatively flat distribution of stellar masses between 0.5-0.65 Msolar for the gold sample of modeled-observed pairs. An interesting result is that the majority of the best-matched models in the gold sample are computed using the convection parameter sets without radiative cooling. The period-Wesenheit relation for the best-matched gold sample of 30 BL Her models exhibits a slope of $-2.805 \pm 0.164$ while the corresponding period-radius relation exhibits a slope of $0.565 \pm 0.035$, both in good agreement with the empirical PW and PR slopes from BL Her stars in the LMC, respectively. We also used the Wesenheit magnitudes of the 30 best-matched modeled-observed pairs to estimate a distance modulus of $μ_{\rm LMC} = 18.582 \pm 0.067$ to the LMC, which lies within the bounds of previous literature values. We also discuss the degeneracy in the stellar parameters of the BL Her models that result in similar pulsation periods and light curve structure, and highlight that caution must be exercised while using the stellar parameter estimates.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
Surveying Facial Recognition Models for Diverse Indian Demographics: A Comparative Analysis on LFW and Custom Dataset
Authors:
Pranav Pant,
Niharika Dadu,
Harsh V. Singh,
Anshul Thakur
Abstract:
Facial recognition technology has made significant advances, yet its effectiveness across diverse ethnic backgrounds, particularly in specific Indian demographics, is less explored. This paper presents a detailed evaluation of both traditional and deep learning-based facial recognition models using the established LFW dataset and our newly developed IITJ Faces of Academia Dataset (JFAD), which com…
▽ More
Facial recognition technology has made significant advances, yet its effectiveness across diverse ethnic backgrounds, particularly in specific Indian demographics, is less explored. This paper presents a detailed evaluation of both traditional and deep learning-based facial recognition models using the established LFW dataset and our newly developed IITJ Faces of Academia Dataset (JFAD), which comprises images of students from IIT Jodhpur. This unique dataset is designed to reflect the ethnic diversity of India, providing a critical test bed for assessing model performance in a focused academic environment. We analyze models ranging from holistic approaches like Eigenfaces and SIFT to advanced hybrid models that integrate CNNs with Gabor filters, Laplacian transforms, and segmentation techniques. Our findings reveal significant insights into the models' ability to adapt to the ethnic variability within Indian demographics and suggest modifications to enhance accuracy and inclusivity in real-world applications. The JFAD not only serves as a valuable resource for further research but also highlights the need for developing facial recognition systems that perform equitably across diverse populations.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
Authors:
Ashish Goswami,
Satyam Kumar Modi,
Santhosh Rishi Deshineni,
Harman Singh,
Prathosh A. P,
Parag Singla
Abstract:
Text-to-image (T2I) generation has seen significant progress with diffusion models, enabling generation of photo-realistic images from text prompts. Despite this progress, existing methods still face challenges in following complex text prompts, especially those requiring compositional and multi-step reasoning. Given such complex instructions, SOTA models often make mistakes in faithfully modeling…
▽ More
Text-to-image (T2I) generation has seen significant progress with diffusion models, enabling generation of photo-realistic images from text prompts. Despite this progress, existing methods still face challenges in following complex text prompts, especially those requiring compositional and multi-step reasoning. Given such complex instructions, SOTA models often make mistakes in faithfully modeling object attributes, and relationships among them. In this work, we present an alternate paradigm for T2I synthesis, decomposing the task of complex multi-step generation into three steps, (a) Generate: we first generate an image using existing diffusion models (b) Plan: we make use of Multi-Modal LLMs (MLLMs) to identify the mistakes in the generated image expressed in terms of individual objects and their properties, and produce a sequence of corrective steps required in the form of an edit-plan. (c) Edit: we make use of an existing text-guided image editing models to sequentially execute our edit-plan over the generated image to get the desired image which is faithful to the original instruction. Our approach derives its strength from the fact that it is modular in nature, is training free, and can be applied over any combination of image generation and editing models. As an added contribution, we also develop a model capable of compositional editing, which further helps improve the overall accuracy of our proposed approach. Our method flexibly trades inference time compute with performance on compositional text prompts. We perform extensive experimental evaluation across 3 benchmarks and 10 T2I models including DALLE-3 and the latest -- SD-3.5-Large. Our approach not only improves the performance of the SOTA models, by upto 3 points, it also reduces the performance gap between weaker and stronger models. $\href{https://dair-iitd.github.io/GraPE/}{https://dair-iitd.github.io/GraPE/}$
△ Less
Submitted 11 March, 2025; v1 submitted 8 December, 2024;
originally announced December 2024.
-
Controlling particle-hole symmetry of fractional quantum hall states in trilayer graphene
Authors:
Simrandeep Kaur,
Harsimran Singh,
Kenji Watanabe,
Takashi Taniguchi,
Unmesh Ghorai,
Manish Jain,
Rajdeep Sensarma,
Aveek Bid
Abstract:
We present a detailed experimental study of the particle-hole symmetry (PHS) of the fractional quantum Hall (FQH) states about half filling in a multiband system. Specifically, we focus on the lowest Landau level of the monolayer-like band of Bernal stacked trilayer graphene (TLG). In pristine TLG, the excitation energy gaps, Landé g-factor, effective mass, and disorder broadening of the odd-denom…
▽ More
We present a detailed experimental study of the particle-hole symmetry (PHS) of the fractional quantum Hall (FQH) states about half filling in a multiband system. Specifically, we focus on the lowest Landau level of the monolayer-like band of Bernal stacked trilayer graphene (TLG). In pristine TLG, the excitation energy gaps, Landé g-factor, effective mass, and disorder broadening of the odd-denominator FQH states are identical to their hole-conjugate counterpart. This precise PH symmetry stems from the lattice mirror symmetry that precludes Landau-level mixing. Introducing a non-zero displacement field \(D\) disrupts this mirror symmetry, facilitating the hybridization between the monolayer-like and bilayer-like Landau levels. This inter-band coupling enhances the Landau level mixing factor $η$ and activates three-body interactions -- both of which explicitly break the PHS of FQHs. As a result, conventional FQHs are completely destabilized, offering a route to engineer symmetry breaking of FQHs in a controlled way. We establish that the PHS breaking in TLG is of extrinsic origin and is fundamentally distinct from the intrinsic, interaction-driven symmetry breaking observed in the lowest Landau levels of single-layer and bilayer graphene.
△ Less
Submitted 13 May, 2025; v1 submitted 27 November, 2024;
originally announced November 2024.
-
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Authors:
Harsh Singh,
Rocktim Jyoti Das,
Mingfei Han,
Preslav Nakov,
Ivan Laptev
Abstract:
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans i…
▽ More
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans in a single pass without real-time feedback. To address these limitations, we propose a novel multi-agent LLM framework, Multi-Agent Large Language Model for Manipulation (MALMM) that distributes high-level planning and low-level control code generation across specialized LLM agents, supervised by an additional agent that dynamically manages transitions. By incorporating observations from the environment after each step, our framework effectively handles intermediate failures and enables adaptive re-planning. Unlike existing methods, our approach does not rely on pre-trained skill policies or in-context learning examples and generalizes to a variety of new tasks. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting, thereby overcoming key limitations of existing LLM-based manipulation methods.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Authors:
Ashmal Vayani,
Dinura Dissanayake,
Hasindri Watawana,
Noor Ahsan,
Nevasini Sasikumar,
Omkar Thawakar,
Henok Biadglign Ademtew,
Yahya Hmaiti,
Amandeep Kumar,
Kartik Kuckreja,
Mykola Maslych,
Wafa Al Ghallabi,
Mihail Mihaylov,
Chao Qin,
Abdelrahman M Shaker,
Mike Zhang,
Mahardika Krisna Ihsani,
Amiel Esplana,
Monil Gokani,
Shachar Mirkin,
Harsh Singh,
Ashay Srivastava,
Endre Hamerlik,
Fathinah Asma Izzati,
Fadillah Adamsyah Maani
, et al. (44 additional authors not shown)
Abstract:
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All La…
▽ More
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.
△ Less
Submitted 30 April, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering
Authors:
Dylan Goetting,
Himanshu Gaurav Singh,
Antonio Loquercio
Abstract:
We present VLMnav, an embodied framework to transform a Vision-Language Model (VLM) into an end-to-end navigation policy. In contrast to prior work, we do not rely on a separation between perception, planning, and control; instead, we use a VLM to directly select actions in one step. Surprisingly, we find that a VLM can be used as an end-to-end policy zero-shot, i.e., without any fine-tuning or ex…
▽ More
We present VLMnav, an embodied framework to transform a Vision-Language Model (VLM) into an end-to-end navigation policy. In contrast to prior work, we do not rely on a separation between perception, planning, and control; instead, we use a VLM to directly select actions in one step. Surprisingly, we find that a VLM can be used as an end-to-end policy zero-shot, i.e., without any fine-tuning or exposure to navigation data. This makes our approach open-ended and generalizable to any downstream navigation task. We run an extensive study to evaluate the performance of our approach in comparison to baseline prompting methods. In addition, we perform a design analysis to understand the most impactful design decisions. Visual examples and code for our project can be found at https://jirl-upenn.github.io/VLMnav/
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Large Scale Response of Gapless $1d$ and Quasi-$1d$ Systems
Authors:
Marcello Porta,
Harman Preet Singh
Abstract:
We consider the transport properties of non-interacting, gapless one-dimensional quantum systems and of the edge modes of two-dimensional topological insulators, in the presence of time-dependent perturbations. We prove the validity of Kubo formula, in the zero temperature and infinite volume limit, for a class of perturbations that are weak and slowly varying in space and in time, in an Euler-lik…
▽ More
We consider the transport properties of non-interacting, gapless one-dimensional quantum systems and of the edge modes of two-dimensional topological insulators, in the presence of time-dependent perturbations. We prove the validity of Kubo formula, in the zero temperature and infinite volume limit, for a class of perturbations that are weak and slowly varying in space and in time, in an Euler-like scaling. The proof relies on the representation of the real time Duhamel series in imaginary time, which allows to prove its convergence uniformly in the scaling parameter and in the size of system, at low temperatures. Furthermore, it allows to exploit a suitable cancellation for the scaling limit of the model, related to the emergent anomalous chiral gauge symmetry of relativistic one-dimensional fermions. The cancellation implies that, as the temperature and the scaling parameter are sent to zero, the linear response is the only contribution to the full response of the system. The explicit form of the leading contribution to the response function is determined by lattice conservation laws. In particular, the method allows to prove the quantization of the edge conductance of $2d$ quantum Hall systems from quantum dynamics.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Spatial-Temporal Bearing Fault Detection Using Graph Attention Networks and LSTM
Authors:
Moirangthem Tiken Singh,
Rabinder Kumar Prasad,
Gurumayum Robert Michael,
N. Hemarjit Singh,
N. K. Kaphungkui
Abstract:
Purpose: This paper aims to enhance bearing fault diagnosis in industrial machinery by introducing a novel method that combines Graph Attention Network (GAT) and Long Short-Term Memory (LSTM) networks. This approach captures both spatial and temporal dependencies within sensor data, improving the accuracy of bearing fault detection under various conditions. Methodology: The proposed method convert…
▽ More
Purpose: This paper aims to enhance bearing fault diagnosis in industrial machinery by introducing a novel method that combines Graph Attention Network (GAT) and Long Short-Term Memory (LSTM) networks. This approach captures both spatial and temporal dependencies within sensor data, improving the accuracy of bearing fault detection under various conditions. Methodology: The proposed method converts time series sensor data into graph representations. GAT captures spatial relationships between components, while LSTM models temporal patterns. The model is validated using the Case Western Reserve University (CWRU) Bearing Dataset, which includes data under different horsepower levels and both normal and faulty conditions. Its performance is compared with methods such as K-Nearest Neighbors (KNN), Local Outlier Factor (LOF), Isolation Forest (IForest) and GNN-based method for bearing fault detection (GNNBFD). Findings: The model achieved outstanding results, with precision, recall, and F1-scores reaching 100\% across various testing conditions. It not only identifies faults accurately but also generalizes effectively across different operational scenarios, outperforming traditional methods. Originality: This research presents a unique combination of GAT and LSTM for fault detection, overcoming the limitations of traditional time series methods by capturing complex spatial-temporal dependencies. Its superior performance demonstrates significant potential for predictive maintenance in industrial applications.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
High sensitivity pressure and temperature quantum sensing in organic crystals
Authors:
Harpreet Singh,
Noella DSouza,
Joseph Garrett,
Angad Singh,
Brian Blankenship,
Emanuel Druga,
Riccardo Montis,
Liang Tan,
Ashok Ajoy
Abstract:
The inherent sensitivity of quantum sensors to their physical environment can make them good reporters of parameters such as temperature, pressure, strain, and electric fields. Here, we present a molecular platform for pressure (P) and temperature (T) sensing using para-terphenyl crystals doped with pentacene. We leverage the optically detected magnetic resonance (ODMR) of the photoexcited triplet…
▽ More
The inherent sensitivity of quantum sensors to their physical environment can make them good reporters of parameters such as temperature, pressure, strain, and electric fields. Here, we present a molecular platform for pressure (P) and temperature (T) sensing using para-terphenyl crystals doped with pentacene. We leverage the optically detected magnetic resonance (ODMR) of the photoexcited triplet electron in the pentacene molecule, that serves as a sensitive probe for lattice changes in the host para-terphenyl due to pressure or temperature variations. We observe maximal ODMR frequency variations of df/dP=1.8 MHz/bar and df/dT=247 kHz/K, which are over 1,200 times and three times greater, respectively, than those seen in nitrogen-vacancy centers in diamond. This results in a >85-fold improvement in pressure sensitivity over best previously reported. The larger variation reflects the weaker nature of the para-terphenyl lattice, with first-principles DFT calculations indicating that even picometer-level shifts in the molecular orbitals due to P, T changes are measurable. The platform offers additional advantages including high levels of sensor doping, narrow ODMR linewidths and high contrasts, and ease of deployment, leveraging the ability for large single crystals at low cost. Overall, this work paves the way for low-cost, optically-interrogated pressure and temperature sensors and lays the foundation for even more versatile sensors enabled by synthetic tunability in designer molecular systems.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Hopping Forcing Number in Random $d$-regular Graphs
Authors:
Pawel Pralat,
Harjas Singh
Abstract:
Hopping forcing is a single player combinatorial game in which the player is presented a graph on $n$ vertices, some of which are initially blue with the remaining vertices being white. In each round $t$, a blue vertex $v$ with all neighbours blue may hop and colour a white vertex blue in the second neighbourhood, provided that $v$ has not performed a hop in the previous $t-1$ rounds. The objectiv…
▽ More
Hopping forcing is a single player combinatorial game in which the player is presented a graph on $n$ vertices, some of which are initially blue with the remaining vertices being white. In each round $t$, a blue vertex $v$ with all neighbours blue may hop and colour a white vertex blue in the second neighbourhood, provided that $v$ has not performed a hop in the previous $t-1$ rounds. The objective of the game is to eventually colour every vertex blue by repeatedly applying the hopping forcing rule. Subsequently, for a given graph $G$, the hopping forcing number is the minimum number of initial blue vertices that are required to achieve the objective.
In this paper, we study the hopping forcing number for random $d$-regular graphs. Specifically, we aim to derive asymptotic upper and lower bounds for the hopping forcing number for various values of $d \geq 2$.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.