-
Interplay of prompt and non-prompt photons in photon-triggered jet observables
Authors:
Chathuranga Sirimanna,
Yasuki Tachibana,
Abhijit Majumder,
Aaron Angerami,
Ritu Arora,
Steffen Bass,
Yi Chen,
Ritoban Datta,
Lipei Du,
Raymond Ehlers,
Hannah Elfner,
Rainer J. Fries,
Charles Gale,
Yayun He,
Barbara Jacak,
Peter Jacobs,
Sangyong Jeon,
Yi Ji,
Florian Jonas,
Lauren Kasper,
Michael Kordell,
Amit Kumar,
Raghav Kunnawalkam-Elayavalli,
Joseph Latessa,
Yen-Jie Lee
, et al. (27 additional authors not shown)
Abstract:
Prompt photons are important yet challenging to observe in relativistic heavy-ion collisions, as they are produced in the early stages and traverse almost the entire QGP medium without interaction. Experimental analyses typically employ isolation cuts, in the hope to identify prompt photons. Most theoretical studies consider only events with actual prompt photons, assuming no contribution from iso…
▽ More
Prompt photons are important yet challenging to observe in relativistic heavy-ion collisions, as they are produced in the early stages and traverse almost the entire QGP medium without interaction. Experimental analyses typically employ isolation cuts, in the hope to identify prompt photons. Most theoretical studies consider only events with actual prompt photons, assuming no contribution from isolated non-prompt photons to reduce computational cost. For the first time, we present a study that compares simulation results generated using inclusive (bremsstrahlung) and prompt-photon events with multiple experimental observables for both $p-p$ and $Pb-Pb$ collisions at $5.02$ TeV. Simulations are carried out using the multi-stage JETSCAPE framework tuned to describe the quenching of jets and hadrons. Isolated non-prompt photons are generated in hard photon bremsstrahlung, where the photon is radiated at a sufficient angle to the jet. Several photon triggered jet and jet substructure observables show significant contributions from inclusive photons, yielding an improvement in comparison with experimental data. Novel photon triggered jet substructure observables are also expected to show new structures, yet to be detected in experiment. This effort examines the significance of isolated non-prompt photons using parameters tuned for a simultaneous description of the leading hadron and jet spectrum, and thus provides an independent verification of the multistage evolution framework.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Effects of hadronic reinteraction on jet fragmentation from small to large systems
Authors:
Hendrik Roch,
Aaron Angerami,
Ritu Arora,
Steffen Bass,
Yi Chen,
Ritoban Datta,
Lipei Du,
Raymond Ehlers,
Hannah Elfner,
Rainer J. Fries,
Charles Gale,
Yayun He,
Barbara Jacak,
Peter Jacobs,
Sangyong Jeon,
Yi Ji,
Florian Jonas,
Lauren Kasper,
Michael Kordell II,
Amit Kumar,
Raghav Kunnawalkam-Elayavalli,
Joseph Latessa,
Yen-Jie Lee,
Roy Lemmon,
Matt Luzum
, et al. (27 additional authors not shown)
Abstract:
We investigate the impact of the hadronic phase on jet quenching in nuclear collider experiments, an open question in heavy-ion physics. Previous studies in a simplified setup suggest that hadronic interactions could have significant effects, but a systematic analysis is needed. Using the X-SCAPE event generator with the SMASH afterburner, we study the role of hadronic rescattering on jet fragment…
▽ More
We investigate the impact of the hadronic phase on jet quenching in nuclear collider experiments, an open question in heavy-ion physics. Previous studies in a simplified setup suggest that hadronic interactions could have significant effects, but a systematic analysis is needed. Using the X-SCAPE event generator with the SMASH afterburner, we study the role of hadronic rescattering on jet fragmentation hadrons. Applying this framework to $e^++e^-$ collisions, we demonstrate that even in small systems with limited particle production, hadronic interactions lead to measurable modifications in final-state hadronic and jet observables by comparing scenarios with and without afterburner rescattering.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Extraction of jet-medium interaction details through jet substructure for inclusive and gamma-tagged jets
Authors:
Y. Tachibana,
C. Sirimanna,
A. Majumder,
A. Angerami,
R. Arora,
S. A. Bass,
Y. Chen,
R. Datta,
L. Du,
R. Ehlers,
H. Elfner,
R. J. Fries,
C. Gale,
Y. He,
B. V. Jacak,
P. M. Jacobs,
S. Jeon,
Y. Ji,
F. Jonas,
L. Kasper,
M. Kordell II,
A. Kumar,
R. Kunnawalkam-Elayavalli,
J. Latessa,
Y. -J. Lee
, et al. (27 additional authors not shown)
Abstract:
We present a comprehensive study of jet substructure modifications in high-energy heavy-ion collisions using both inclusive jets and $γ$-tagged jets, based on a multi-stage jet evolution model within the Monte Carlo framework JETSCAPE. To investigate hard parton splittings inside jets, we focus on Soft Drop observables. Our results for the groomed splitting radius and groomed jet mass distribution…
▽ More
We present a comprehensive study of jet substructure modifications in high-energy heavy-ion collisions using both inclusive jets and $γ$-tagged jets, based on a multi-stage jet evolution model within the Monte Carlo framework JETSCAPE. To investigate hard parton splittings inside jets, we focus on Soft Drop observables. Our results for the groomed splitting radius and groomed jet mass distributions of inclusive jets show a slight narrowing compared to proton-proton baselines. We demonstrate that this apparent narrowing is primarily a selection bias from energy loss, rather than a direct modification of the splitting structure, by analyzing $γ$-tagged jets, where such bias is eliminated or significantly reduced. We also show that quark jets exhibit genuine modifications in their splitting structure, which is not seen in gluon jets. These effects are clearly visible in the substructure of $γ$-tagged jets, which are dominated by quark jets, but are not apparent for inclusive jets. This demonstrates that $γ$-tagged jets offer a powerful probe of medium-induced modifications to the hard splitting structure of jets.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
First Finish Search: Efficient Test-Time Scaling in Large Language Models
Authors:
Aradhye Agarwal,
Ayan Sengupta,
Tanmoy Chakraborty
Abstract:
Test-time scaling (TTS), which involves dynamic allocation of compute during inference, offers a promising way to improve reasoning in large language models. While existing TTS methods work well, they often rely on long decoding paths or require a large number of samples to be generated, increasing the token usage and inference latency. We observe the surprising fact that for reasoning tasks, shor…
▽ More
Test-time scaling (TTS), which involves dynamic allocation of compute during inference, offers a promising way to improve reasoning in large language models. While existing TTS methods work well, they often rely on long decoding paths or require a large number of samples to be generated, increasing the token usage and inference latency. We observe the surprising fact that for reasoning tasks, shorter traces are much more likely to be correct than longer ones. Motivated by this, we introduce First Finish Search (FFS), a training-free parallel decoding strategy that launches $n$ independent samples and returns as soon as any one completes. We evaluate FFS alongside simple decoding, beam search, majority voting, and budget forcing on four reasoning models (DeepSeek-R1, R1-Distill-Qwen-32B, QwQ-32B and Phi-4-Reasoning-Plus) and across four datasets (AIME24, AIME25-I, AIME25-II and GPQA Diamond). With DeepSeek-R1, FFS achieves $82.23\%$ accuracy on the AIME datasets, a $15\%$ improvement over DeepSeek-R1's standalone accuracy, nearly matching OpenAI's o4-mini performance. Our theoretical analysis explains why stopping at the shortest trace is likely to yield a correct answer and identifies the conditions under which early stopping may be suboptimal. The elegance and simplicity of FFS demonstrate that straightforward TTS strategies can perform remarkably well, revealing the untapped potential of simple approaches at inference time.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
On the Generalization vs Fidelity Paradox in Knowledge Distillation
Authors:
Suhas Kamasetty Ramesh,
Ayan Sengupta,
Tanmoy Chakraborty
Abstract:
Knowledge distillation (KD) is a key technique for compressing large language models into smaller ones while preserving performance. Despite the recent traction of KD research, its effectiveness for smaller language models (LMs) and the mechanisms driving knowledge transfer remain underexplored. In this work, we present the first large-scale empirical and statistical analysis of KD across models r…
▽ More
Knowledge distillation (KD) is a key technique for compressing large language models into smaller ones while preserving performance. Despite the recent traction of KD research, its effectiveness for smaller language models (LMs) and the mechanisms driving knowledge transfer remain underexplored. In this work, we present the first large-scale empirical and statistical analysis of KD across models ranging from 0.5B to 7B parameters on 14 complex reasoning tasks in a zero-shot setting. Our findings reveal that KD can improve the average performance of smaller models by up to $10\%$, with a peak task specific gain of $22\%$, while providing only marginal benefits ($\sim 1.3\%$) for larger models. Surprisingly, teacher performance has a minimal impact on student outcomes, while teacher task expertise impacts KD effectiveness. A correlation study indicates that smaller LMs benefit more from KD, whereas larger LMs show diminished gains. Additionally, we uncover a misalignment between improvements in student performance and reasoning fidelity, suggesting that while KD enhances accuracy, it does not always maintain the structured decision-making processes of the teacher. Our ablation study further highlights the importance of teacher signals and logit smoothing in influencing students' performance after distillation. Overall, our study offers a comprehensive empirical and statistical assessment of KD, highlighting both its benefits and trade-offs when distilling knowledge from larger to smaller LMs.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Deformation of Jets Induced by Ambient Medium Flow
Authors:
Arjun Sengupta,
Rainer J. Fries
Abstract:
The evolution of jets showers in high energy nuclear collisions is influenced in various ways by the presence of a surrounding medium. The interaction of jet constituents with the medium can happen during the partonic stage of the jet, during hadronization, and even during its hadronic stage. We demonstrate how flow of the ambient medium in a direction transverse to the jet can introduce both dipo…
▽ More
The evolution of jets showers in high energy nuclear collisions is influenced in various ways by the presence of a surrounding medium. The interaction of jet constituents with the medium can happen during the partonic stage of the jet, during hadronization, and even during its hadronic stage. We demonstrate how flow of the ambient medium in a direction transverse to the jet can introduce both dipole and quadrupole defomations. We propose to analyze the $n=1$ and $n=2$ harmonic deformations of soft and semi-hard hadrons or subjets in a jet with respect to the jet core using the method of $q$-vectors. We discuss simulations which show how the transverse shapes and their preferred angles evolve when the ambient environment of jets changes from the vacuum to a parton medium without flow and finally to a medium with various rates of transverse flow. Our study includes the effects of both flow during the development of the parton shower and hadronization. The existence of dipole deformations, and the correlation of the angles of dipole and quadrupole deformations could constitute promising experimental signals for the presence and size of ambient transverse flow.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Revolutionising Bacterial Genomics: Graph-Based Strategies for Improved Variant Identification
Authors:
Fathima Nuzla Ismail,
Abira Sengupta
Abstract:
A significant advancement in bioinformatics is using genome graph techniques to improve variation discovery across organisms. Traditional approaches, such as bwa mem, rely on linear reference genomes for genomic analyses but may introduce biases when applied to highly diverse bacterial genomes of the same species. Pangenome graphs provide an alternative paradigm for evaluating structural and minor…
▽ More
A significant advancement in bioinformatics is using genome graph techniques to improve variation discovery across organisms. Traditional approaches, such as bwa mem, rely on linear reference genomes for genomic analyses but may introduce biases when applied to highly diverse bacterial genomes of the same species. Pangenome graphs provide an alternative paradigm for evaluating structural and minor variations within a graphical framework, including insertions, deletions, and single nucleotide polymorphisms. Pangenome graphs enhance the detection and interpretation of complex genetic variants by representing the full genetic diversity of a species. In this study, we present a robust and reliable bioinformatics pipeline utilising the PanGenome Graph Builder (PGGB) and the Variation Graph toolbox (vg giraffe) to align whole-genome sequencing data, call variants against a graph reference, and construct pangenomes from assembled genomes. Our results demonstrate that leveraging pangenome graphs over a single linear reference genome significantly improves mapping rates and variant calling accuracy for simulated and actual bacterial pathogens datasets.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Spatio-temporal pulse propagation during highly-resolved onset of Rayleigh-Taylor and Kelvin-Helmholtz Rayleigh-Taylor instabilities
Authors:
Bhavna Joshi,
Aditi Sengupta,
Yassin Ajanif,
Lucas Lestandi
Abstract:
The present study explores the onset of the Rayleigh-Taylor instability (RTI) and Kelvin-Helmholtz Rayleigh-Taylor instability (KHRTI) with highly-resolved direct numerical simulations of two setups which consider air at different temperatures (or densities) and/or velocities in two halves of three-dimensional cuboidal domains. The compressible Navier-Stokes equations are solved using a novel para…
▽ More
The present study explores the onset of the Rayleigh-Taylor instability (RTI) and Kelvin-Helmholtz Rayleigh-Taylor instability (KHRTI) with highly-resolved direct numerical simulations of two setups which consider air at different temperatures (or densities) and/or velocities in two halves of three-dimensional cuboidal domains. The compressible Navier-Stokes equations are solved using a novel parallel algorithm which does not involve overlapping points at sub-domain boundaries. The pressure disturbance field is compared during onset of RTI and KHRTI and corresponding convection- and advection-dominated mechanisms are highlighted by instantaneous features, spectra, and proper orthogonal decomposition. The relative contributions of pressure, kinetic energy and rotational energy to the overall energy budget is explored for both instabilities, revealing acoustic trigger to be the incipient mechanism for both RTI and KHRTI. The nonlinear, spatio-temporal nature of the instability is further explored by application of a transport equation for enstrophy of compressible flows. This provides insights into the similarities and differences between the onset mechanisms of RTI and KHRTI, serving as a benchmark data set for shear and buoyancy-driven instabilities across diverse applications in geophysics, nuclear energy and atmospheric fluid dynamics.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Big Data Architecture for Large Organizations
Authors:
Fathima Nuzla Ismail,
Abira Sengupta,
Shanika Amarasoma
Abstract:
The exponential growth of big data has transformed how large organisations leverage information to drive innovation, optimise processes, and maintain competitive advantages. However, managing and extracting insights from vast, heterogeneous data sources requires a scalable, secure, and well-integrated big data architecture. This paper proposes a comprehensive big data framework that aligns with or…
▽ More
The exponential growth of big data has transformed how large organisations leverage information to drive innovation, optimise processes, and maintain competitive advantages. However, managing and extracting insights from vast, heterogeneous data sources requires a scalable, secure, and well-integrated big data architecture. This paper proposes a comprehensive big data framework that aligns with organisational objectives while ensuring flexibility, scalability, and governance. The architecture encompasses multiple layers, including data ingestion, transformation, storage, analytics, machine learning, and security, incorporating emerging technologies such as Generative AI (GenAI) and low-code machine learning. Cloud-based implementations across Google Cloud, AWS, and Microsoft Azure are analysed, highlighting their tools and capabilities. Additionally, this study explores advancements in big data architecture, including AI-driven automation, data mesh, and Data Ocean paradigms. By establishing a structured, adaptable framework, this research provides a foundational blueprint for large organisations to harness big data as a strategic asset effectively.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Dendritic Computing with Multi-Gate Ferroelectric Field-Effect Transistors
Authors:
A N M Nafiul Islam,
Xuezhong Niu,
Jiahui Duan,
Shubham Kumar,
Kai Ni,
Abhronil Sengupta
Abstract:
Although inspired by neuronal systems in the brain, artificial neural networks generally employ point-neurons, which offer far less computational complexity than their biological counterparts. Neurons have dendritic arbors that connect to different sets of synapses and offer local non-linear accumulation - playing a pivotal role in processing and learning. Inspired by this, we propose a novel neur…
▽ More
Although inspired by neuronal systems in the brain, artificial neural networks generally employ point-neurons, which offer far less computational complexity than their biological counterparts. Neurons have dendritic arbors that connect to different sets of synapses and offer local non-linear accumulation - playing a pivotal role in processing and learning. Inspired by this, we propose a novel neuron design based on a multi-gate ferroelectric field-effect transistor that mimics dendrites. It leverages ferroelectric nonlinearity for local computations within dendritic branches, while utilizing the transistor action to generate the final neuronal output. The branched architecture paves the way for utilizing smaller crossbar arrays in hardware integration, leading to greater efficiency. Using an experimentally calibrated device-circuit-algorithm co-simulation framework, we demonstrate that networks incorporating our dendritic neurons achieve superior performance in comparison to much larger networks without dendrites ($\sim$17$\times$ fewer trainable weight parameters). These findings suggest that dendritic hardware can significantly improve computational efficiency, and learning capacity of neuromorphic systems optimized for edge applications.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Authors:
Yash Goel,
Ayan Sengupta,
Tanmoy Chakraborty
Abstract:
We challenge the dominant focus on neural scaling laws and advocate for a paradigm shift toward downscaling in the development of large language models (LLMs). While scaling laws have provided critical insights into performance improvements through increasing model and dataset size, we emphasize the significant limitations of this approach, particularly in terms of computational inefficiency, envi…
▽ More
We challenge the dominant focus on neural scaling laws and advocate for a paradigm shift toward downscaling in the development of large language models (LLMs). While scaling laws have provided critical insights into performance improvements through increasing model and dataset size, we emphasize the significant limitations of this approach, particularly in terms of computational inefficiency, environmental impact, and deployment constraints. To address these challenges, we propose a holistic framework for downscaling LLMs that seeks to maintain performance while drastically reducing resource demands. This paper outlines practical strategies for transitioning away from traditional scaling paradigms, advocating for a more sustainable, efficient, and accessible approach to LLM development.
△ Less
Submitted 25 May, 2025; v1 submitted 2 May, 2025;
originally announced May 2025.
-
Enhancing short-term traffic prediction by integrating trends and fluctuations with attention mechanism
Authors:
Adway Das,
Agnimitra Sengupta,
S. Ilgin Guler
Abstract:
Traffic flow prediction is a critical component of intelligent transportation systems, yet accurately forecasting traffic remains challenging due to the interaction between long-term trends and short-term fluctuations. Standard deep learning models often struggle with these challenges because their architectures inherently smooth over fine-grained fluctuations while focusing on general trends. Thi…
▽ More
Traffic flow prediction is a critical component of intelligent transportation systems, yet accurately forecasting traffic remains challenging due to the interaction between long-term trends and short-term fluctuations. Standard deep learning models often struggle with these challenges because their architectures inherently smooth over fine-grained fluctuations while focusing on general trends. This limitation arises from low-pass filtering effects, gate biases favoring stability, and memory update mechanisms that prioritize long-term information retention. To address these shortcomings, this study introduces a hybrid deep learning framework that integrates both long-term trend and short-term fluctuation information using two input features processed in parallel, designed to capture complementary aspects of traffic flow dynamics. Further, our approach leverages attention mechanisms, specifically Bahdanau attention, to selectively focus on critical time steps within traffic data, enhancing the model's ability to predict congestion and other transient phenomena. Experimental results demonstrate that features learned from both branches are complementary, significantly improving the goodness-of-fit statistics across multiple prediction horizons compared to a baseline model. Notably, the attention mechanism enhances short-term forecast accuracy by directly targeting immediate fluctuations, though challenges remain in fully integrating long-term trends. This framework can contribute to more effective congestion mitigation and urban mobility planning by advancing the robustness and precision of traffic prediction models.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Generalised tree modules: Hom-sets and indecomposability
Authors:
Annoy Sengupta,
Amit Kuber
Abstract:
For a zero-relation algebra over a field $\mathcal K$, Crawley-Boevey introduced the concept of a tree module and provided a combinatorial description of a basis for the space of homomorphisms between two tree modules--the basis elements are called graph maps. The indecomposability of tree modules is essentially due to Gabriel. We relax a condition in the definition of a tree module to define gene…
▽ More
For a zero-relation algebra over a field $\mathcal K$, Crawley-Boevey introduced the concept of a tree module and provided a combinatorial description of a basis for the space of homomorphisms between two tree modules--the basis elements are called graph maps. The indecomposability of tree modules is essentially due to Gabriel. We relax a condition in the definition of a tree module to define generalised tree modules and when $\mathrm{char}(\mathcal K)\neq2$, under a certain condition, provide a combinatorial description of a finite generating set for the space of homomorphisms between two such modules--we call the generators generalised graph maps. As an application, we provide a sufficient condition for the (in)decomposability of certain generalised tree modules. We also show that all indecomposable modules over a Dynkin quiver of type $\mathbf D$ are isomorphic to generalised tree modules--this result also follows from a theorem of Ringel which states that all exceptional modules over the path algebra $\mathcal KQ$ of a finite quiver $Q$ are generalised tree modules.
△ Less
Submitted 22 May, 2025; v1 submitted 26 April, 2025;
originally announced April 2025.
-
Rank Bounds and PIT for $Σ^3 ΠΣΠ^d$ circuits via a non-linear Edelstein-Kelly theorem
Authors:
Abhibhav Garg,
Rafael Oliveira,
Akash Kumar Sengupta
Abstract:
We prove a non-linear Edelstein-Kelly theorem for polynomials of constant degree, fully settling a stronger form of Conjecture 30 in Gupta (2014), and generalizing the main result of Peleg and Shpilka (STOC 2021) from quadratic polynomials to polynomials of any constant degree.
As a consequence of our result, we obtain constant rank bounds for depth-4 circuits with top fanin 3 and constant botto…
▽ More
We prove a non-linear Edelstein-Kelly theorem for polynomials of constant degree, fully settling a stronger form of Conjecture 30 in Gupta (2014), and generalizing the main result of Peleg and Shpilka (STOC 2021) from quadratic polynomials to polynomials of any constant degree.
As a consequence of our result, we obtain constant rank bounds for depth-4 circuits with top fanin 3 and constant bottom fanin (denoted $Σ^{3}ΠΣΠ^{d}$ circuits) which compute the zero polynomial. This settles a stronger form of Conjecture 1 in Gupta (2014) when $k=3$, for any constant degree bound; additionally this also makes progress on Conjecture 28 in Beecken, Mittmann, and Saxena (Information \& Computation, 2013). Our rank bounds, when combined with Theorem 2 in Beecken, Mittmann, and Saxena (Information \& Computation, 2013) yield the first deterministic, polynomial time PIT algorithm for $Σ^{3}ΠΣΠ^{d}$ circuits.
△ Less
Submitted 25 April, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
A Bio-inspired Asymmetric Double-Gate Ferroelectric FET for Emulating Astrocyte and Dendrite Dynamics in Neuromorphic Systems
Authors:
Zhouhang Jiang,
A N M Nafiul Islam,
Zhuangyu Han,
Zijian Zhao,
Franz Müller,
Jiahui Duan,
Halid Mulaosmanovic,
Stefan Dünkel,
Sven Beyer,
Sourav Dutta,
Vijaykrishnan Narayanan,
Thomas Kämpfe,
Suma George Cardwell,
Frances Chance,
Abhronil Sengupta,
Kai Ni
Abstract:
Neuromorphic systems seek to replicate the functionalities of biological neural networks to attain significant improvements in performance and efficiency of AI computing platforms. However, these systems have generally remained limited to emulation of simple neurons and synapses; and ignored higher order functionalities enabled by other components of the brain like astrocytes and dendrites. In thi…
▽ More
Neuromorphic systems seek to replicate the functionalities of biological neural networks to attain significant improvements in performance and efficiency of AI computing platforms. However, these systems have generally remained limited to emulation of simple neurons and synapses; and ignored higher order functionalities enabled by other components of the brain like astrocytes and dendrites. In this work, drawing inspiration from biology, we introduce a compact Double-Gate Ferroelectric Field Effect Transistor (DG-FeFET) cell that can emulate the dynamics of both astrocytes and dendrites within neuromorphic architectures. We demonstrate that with a ferroelectric top gate for synaptic weight programming as in conventional synapses and a non-ferroelectric back gate, the DG-FeFET realizes a synapse with a dynamic gain modulation mechanism. This can be leveraged as an analog for a compact astrocyte-tripartite synapse, as well as enabling dendrite-like gain modulation operations. By employing a fully-depleted silicon-on-insulator (FDSOI) FeFET as our double-gate device, we validate the linear control of the synaptic weight via the back gate terminal (i.e., the gate underneath the buried oxide (BOX) layer) through comprehensive theoretical and experimental studies. We showcase the promise such a tripartite synaptic device holds for numerous important neuromorphic applications, including autonomous self-repair of faulty neuromorphic hardware mediated by astrocytic functionality. Coordinate transformations based on dragonfly prey-interception circuitry models are also demonstrated based on dendritic function emulation by the device. This work paves the way forward for developing truly "brain-like" neuromorphic hardware that go beyond the current dogma focusing only on neurons and synapses.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Enhancing Deterministic Freezing Level Predictions in the Northern Sierra Nevada Through Deep Neural Networks
Authors:
Vesta Afzali Gorooh,
Agniv Sengupta,
Shawn Roj,
Rachel Weihs,
Brian Kawzenuk,
Luca Delle Monache,
F. Martin Ralph
Abstract:
Accurate prediction of the freezing level (FZL) is essential for hydrometeorological forecasting systems and precipitation phase estimation, and it influences runoff generation and reservoir management decisions. In this study, we develop a deep learning based postprocessing framework using the Unet convolutional neural network (CNN) architecture to refine the FZL forecasts from the West Weather R…
▽ More
Accurate prediction of the freezing level (FZL) is essential for hydrometeorological forecasting systems and precipitation phase estimation, and it influences runoff generation and reservoir management decisions. In this study, we develop a deep learning based postprocessing framework using the Unet convolutional neural network (CNN) architecture to refine the FZL forecasts from the West Weather Research and Forecasting (West-WRF) model. The proposed framework leverages reforecast data from West WRF and FZL estimates from the California Nevada River Forecast Center (CNRFC) to train a deterministic Unet model over the Yuba-Feather watershed, a hydrologically critical basin in northern California. We introduce two variants of our model, Unet-Log and Unet-GMM, which utilize the logarithm of the hyperbolic cosine of Error and Gaussian Mixture Model loss functions, respectively, to enhance FZL forecast accuracy beyond an RMSE based benchmark. Results indicate that the Unet based postprocessing framework significantly improves FZL forecast skill across diverse atmospheric conditions and complex topography. Compared to the raw West-WRF output, our model achieves reductions in RMSE of up to 25% and increases the forecast observation correlation by about 10% over the Yuba-Feather watershed. Furthermore, it effectively captures the spatiotemporal variability of the FZL across different elevations, mitigating systematic biases inherent in the West-WRF model. This novel deep learning based postprocessing approach demonstrates a promising pathway for integrating machine learning into hydrometeorological forecasting and decision support within the Forecast Informed Reservoir Operations (FIRO) framework.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Toward Spiking Neural Network Local Learning Modules Resistant to Adversarial Attacks
Authors:
Jiaqi Lin,
Abhronil Sengupta
Abstract:
Recent research has shown the vulnerability of Spiking Neural Networks (SNNs) under adversarial examples that are nearly indistinguishable from clean data in the context of frame-based and event-based information. The majority of these studies are constrained in generating adversarial examples using Backpropagation Through Time (BPTT), a gradient-based method which lacks biological plausibility. I…
▽ More
Recent research has shown the vulnerability of Spiking Neural Networks (SNNs) under adversarial examples that are nearly indistinguishable from clean data in the context of frame-based and event-based information. The majority of these studies are constrained in generating adversarial examples using Backpropagation Through Time (BPTT), a gradient-based method which lacks biological plausibility. In contrast, local learning methods, which relax many of BPTT's constraints, remain under-explored in the context of adversarial attacks. To address this problem, we examine adversarial robustness in SNNs through the framework of four types of training algorithms. We provide an in-depth analysis of the ineffectiveness of gradient-based adversarial attacks to generate adversarial instances in this scenario. To overcome these limitations, we introduce a hybrid adversarial attack paradigm that leverages the transferability of adversarial instances. The proposed hybrid approach demonstrates superior performance, outperforming existing adversarial attack methods. Furthermore, the generalizability of the method is assessed under multi-step adversarial attacks, adversarial attacks in black-box FGSM scenarios, and within the non-spiking domain.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Hybrid machine learning models based on physical patterns to accelerate CFD simulations: a short guide on autoregressive models
Authors:
Arindam Sengupta,
Rodrigo Abadía-Heredia,
Ashton Hetherington,
José Miguel Pérez,
Soledad Le Clainche
Abstract:
Accurate modeling of the complex dynamics of fluid flows is a fundamental challenge in computational physics and engineering. This study presents an innovative integration of High-Order Singular Value Decomposition (HOSVD) with Long Short-Term Memory (LSTM) architectures to address the complexities of reduced-order modeling (ROM) in fluid dynamics. HOSVD improves the dimensionality reduction proce…
▽ More
Accurate modeling of the complex dynamics of fluid flows is a fundamental challenge in computational physics and engineering. This study presents an innovative integration of High-Order Singular Value Decomposition (HOSVD) with Long Short-Term Memory (LSTM) architectures to address the complexities of reduced-order modeling (ROM) in fluid dynamics. HOSVD improves the dimensionality reduction process by preserving multidimensional structures, surpassing the limitations of Singular Value Decomposition (SVD). The methodology is tested across numerical and experimental data sets, including two- and three-dimensional (2D and 3D) cylinder wake flows, spanning both laminar and turbulent regimes. The emphasis is also on exploring how the depth and complexity of LSTM architectures contribute to improving predictive performance. Simpler architectures with a single dense layer effectively capture the periodic dynamics, demonstrating the network's ability to model non-linearities and chaotic dynamics. The addition of extra layers provides higher accuracy at minimal computational cost. These additional layers enable the network to expand its representational capacity, improving the prediction accuracy and reliability. The results demonstrate that HOSVD outperforms SVD in all tested scenarios, as evidenced by using different error metrics. Efficient mode truncation by HOSVD-based models enables the capture of complex temporal patterns, offering reliable predictions even in challenging, noise-influenced data sets. The findings underscore the adaptability and robustness of HOSVD-LSTM architectures, offering a scalable framework for modeling fluid dynamics.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Compression Laws for Large Language Models
Authors:
Ayan Sengupta,
Siddhant Chaudhary,
Tanmoy Chakraborty
Abstract:
We introduce compression laws for language language models (LLMs). While recent scaling laws have sought to understand how LLMs scale with respect to model size, pre-training data, and computational resources, we focus on understanding how model compression affects the performance of a pre-trained LLM on downstream tasks. We empirically examine the effects of structured model compression on LLMs t…
▽ More
We introduce compression laws for language language models (LLMs). While recent scaling laws have sought to understand how LLMs scale with respect to model size, pre-training data, and computational resources, we focus on understanding how model compression affects the performance of a pre-trained LLM on downstream tasks. We empirically examine the effects of structured model compression on LLMs through over $1000$ experiments across eight models with sizes ranging from $0.5B$ to $14B$ parameters. Our findings indicate that the test cross-entropy loss increases quadratically with the compression ratio, whereas performance on downstream tasks declines only linearly. Our study emphasizes the importance of recovery fine-tuning in enhancing generation loss, showing that the test loss of compressed LLMs can improve by up to 55% with recovery fine-tuning. At higher compression ratios (up to 90%), compressed LLMs demonstrate a speed increase of 60% during inference compared to their uncompressed counterparts, compensating for the performance degradation at this level. However, for smaller models ($\le 7B$), the computational gains are limited, peaking at just 35%. We conclude that model compression can be highly beneficial for larger models, especially when a smaller model within the same computational budget is not available. These insights provide the practical guidelines for utilizing model compression techniques for adopting LLMs in real-life applications in resource-constrained settings.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
Enhanced signal of momentum broadening in hard splittings for $γ$-tagged jets in a multistage approach
Authors:
Y. Tachibana,
C. Sirimanna,
A. Majumder,
A. Angerami,
R. Arora,
S. A. Bass,
Y. Chen,
R. Datta,
L. Du,
R. Ehlers,
H. Elfner,
R. J. Fries,
C. Gale,
Y. He,
B. V. Jacak,
P. M. Jacobs,
S. Jeon,
Y. Ji,
F. Jonas,
L. Kasper,
M. Kordell II,
A. Kumar,
R. Kunnawalkam-Elayavalli,
J. Latessa,
Y. -J. Lee
, et al. (27 additional authors not shown)
Abstract:
We investigate medium-induced modifications to jet substructure observables that characterize hard splitting patterns in central Pb-Pb collisions at the top energy of the Large Hadron Collider (LHC). Using a multistage Monte Carlo simulation of in-medium jet shower evolution, we explore flavor-dependent medium effects through simulations of inclusive and $γ$-tagged jets. The results show that quar…
▽ More
We investigate medium-induced modifications to jet substructure observables that characterize hard splitting patterns in central Pb-Pb collisions at the top energy of the Large Hadron Collider (LHC). Using a multistage Monte Carlo simulation of in-medium jet shower evolution, we explore flavor-dependent medium effects through simulations of inclusive and $γ$-tagged jets. The results show that quark jets undergo a non-monotonic modification compared to gluon jets in observables such as the Pb-Pb to $p$-$p$ ratio of the Soft Drop prong angle $r_g$, the relative prong transverse momentum $k_{T,g}$ and the groomed mass $m_g$ distributions. Due to this non-monotonic modification, $γ$-tagged jets, enriched in quark jets, provide surprisingly clear signals of medium-induced structural modifications, distinct from effects dominated by selection bias. This work highlights the potential of hard substructures in $γ$-tagged jets as powerful tools for probing the jet-medium interactions in high-energy heavy-ion collisions. All simulations for $γ$-tagged jet analyses carried out in this paper used triggered events containing at least one hard photon, which highlights the utility of these observables for future Bayesian analysis.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
An In-Situ Spatial-Temporal Sequence Detector for Neuromorphic Vision Sensor Empowered by High Density Vertical NAND Storage
Authors:
Zijian Zhao,
Varun Darshana Parekh,
Po-Kai Hsu,
Yixin Qin,
Yiming Song,
A N M Nafiul Islam,
Ningyuan Cao,
Siddharth Joshi,
Thomas Kämpfe,
Moonyoung Jung,
Kwangyou Seo,
Kwangsoo Kim,
Wanki Kim,
Daewon Ha,
Sourav Dutta,
Abhronil Sengupta,
Xiao Gong,
Shimeng Yu,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Neuromorphic vision sensors require efficient real-time pattern recognition, yet conventional architectures struggle with energy and latency constraints. Here, we present a novel in-situ spatiotemporal sequence detector that leverages vertical NAND storage to achieve massively parallel pattern detection. By encoding each cell with two single-transistor-based multi-level cell (MLC) memory elements,…
▽ More
Neuromorphic vision sensors require efficient real-time pattern recognition, yet conventional architectures struggle with energy and latency constraints. Here, we present a novel in-situ spatiotemporal sequence detector that leverages vertical NAND storage to achieve massively parallel pattern detection. By encoding each cell with two single-transistor-based multi-level cell (MLC) memory elements, such as ferroelectric field-effect transistors (FeFETs), and mapping a pixel's temporal sequence onto consecutive word lines (WLs), we enable direct temporal pattern detection within NAND strings. Each NAND string serves as a dedicated reference for a single pixel, while different blocks store patterns for distinct pixels, allowing large-scale spatial-temporal pattern recognition via simple direct bit-line (BL) sensing, a well-established operation in vertical NAND storage. We experimentally validate our approach at both the cell and array levels, demonstrating that vertical NAND-based detector achieves more than six orders of magnitude improvement in energy efficiency and more than three orders of magnitude reduction in latency compared to conventional CPU-based methods. These findings establish vertical NAND storage as a scalable and energy-efficient solution for next-generation neuromorphic vision processing.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Partial Quantum Shadow Tomography for Structured Operators and its Experimental Demonstration using NMR
Authors:
Aniket Sengupta,
Arijit Chatterjee,
G. J. Sreejith,
T. S. Mahesh
Abstract:
Quantum shadow tomography based on the classical shadow representation provides an efficient way to estimate properties of an unknown quantum state without performing a full quantum state tomography. In scenarios where estimating the expectation values for only certain classes of observables is required, obtaining information about the entire density matrix is unnecessary. We propose a partial qua…
▽ More
Quantum shadow tomography based on the classical shadow representation provides an efficient way to estimate properties of an unknown quantum state without performing a full quantum state tomography. In scenarios where estimating the expectation values for only certain classes of observables is required, obtaining information about the entire density matrix is unnecessary. We propose a partial quantum shadow tomography protocol, which allows estimation of a subset of density matrix elements contributing to the expectation values of certain classes of structured observables. This method utilizes tomographically incomplete subsets of single qubit Pauli basis measurements to perform partial shadow tomography, making it experimentally more efficient. We demonstrate the advantage over unitary $k$-designs such as Clifford, full Pauli basis, and methods utilizing mutually unbiased bases by numerically analyzing the protocol for structured density matrices and observables. We experimentally demonstrate the partial shadow estimation scheme for a wide class of two-qubit states (pure, entangled, and mixed) in the nuclear magnetic resonance (NMR) platform, which relies on ensemble-based measurements. The full density matrix experimentally reconstructed by combining different partial estimators produces fidelities exceeding 97%.
△ Less
Submitted 24 March, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
ArtInsight: Enabling AI-Powered Artwork Engagement for Mixed Visual-Ability Families
Authors:
Arnavi Chheda-Kothary,
Ritesh Kanchi,
Chris Sanders,
Kevin Xiao,
Aditya Sengupta,
Melanie Kneitmix,
Jacob O. Wobbrock,
Jon E. Froehlich
Abstract:
We introduce ArtInsight, a novel AI-powered system to facilitate deeper engagement with child-created artwork in mixed visual-ability families. ArtInsight leverages large language models (LLMs) to craft a respectful and thorough initial description of a child's artwork, and provides: creative AI-generated descriptions for a vivid overview, audio recording to capture the child's own description of…
▽ More
We introduce ArtInsight, a novel AI-powered system to facilitate deeper engagement with child-created artwork in mixed visual-ability families. ArtInsight leverages large language models (LLMs) to craft a respectful and thorough initial description of a child's artwork, and provides: creative AI-generated descriptions for a vivid overview, audio recording to capture the child's own description of their artwork, and a set of AI-generated questions to facilitate discussion between blind or low-vision (BLV) family members and their children. Alongside ArtInsight, we also contribute a new rubric to score AI-generated descriptions of child-created artwork and an assessment of state-of-the-art LLMs. We evaluated ArtInsight with five groups of BLV family members and their children, and as a case study with one BLV child therapist. Our findings highlight a preference for ArtInsight's longer, artistically-tailored descriptions over those generated by existing BLV AI tools. Participants highlighted the creative description and audio recording components as most beneficial, with the former helping ``bring a picture to life'' and the latter centering the child's narrative to generate context-aware AI responses. Our findings reveal different ways that AI can be used to support art engagement, including before, during, and after interaction with the child artist, as well as expectations that BLV adults and their sighted children have about AI-powered tools.
△ Less
Submitted 10 March, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Towards Robustness Across Cosmological Simulation Models TNG, SIMBA, ASTRID, and EAGLE
Authors:
Yongseok Jo,
Shy Genel,
Anirvan Sengupta,
Benjamin Wandelt,
Rachel Somerville,
Francisco Villaescusa-Navarro
Abstract:
The rapid advancement of large-scale cosmological simulations has opened new avenues for cosmological and astrophysical research. However, the increasing diversity among cosmological simulation models presents a challenge to the robustness. In this work, we develop the Model-Insensitive ESTimator (MIEST), a machine that can robustly estimate the cosmological parameters, $Ω_m$ and $σ_8$, from neura…
▽ More
The rapid advancement of large-scale cosmological simulations has opened new avenues for cosmological and astrophysical research. However, the increasing diversity among cosmological simulation models presents a challenge to the robustness. In this work, we develop the Model-Insensitive ESTimator (MIEST), a machine that can robustly estimate the cosmological parameters, $Ω_m$ and $σ_8$, from neural hydrogen maps of simulation models in the CAMELS project$-$TNG, SIMBA, ASTRID, and EAGLE. An estimator is considered robust if it possesses a consistent predictive power across all simulations, including those used during the training phase. We train our machine using multiple simulation models and ensure that it only extracts common features between the models while disregarding the model-specific features. This allows us to develop a novel model that is capable of accurately estimating parameters across a range of simulation models, without being biased towards any particular model. Upon the investigation of the latent space$-$a set of summary statistics, we find that the implementation of robustness leads to the blending of latent variables across different models, demonstrating the removal of model-specific features. In comparison to a standard machine lacking robustness, the average performance of MIEST on the unseen simulations during the training phase has been improved by $\sim17$% for $Ω_m$ and $\sim 38$% for $σ_8$. By using a machine learning approach that can extract robust, yet physical features, we hope to improve our understanding of galaxy formation and evolution in a (subgrid) model-insensitive manner, and ultimately, gain insight into the underlying physical processes responsible for robustness. This is a Learning the Universe publication.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Authors:
Ayan Sengupta,
Yash Goel,
Tanmoy Chakraborty
Abstract:
Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalit…
▽ More
Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalities, and deployment contexts. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. Moreover, scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches. In this survey, we synthesize insights from over 50 studies, examining the theoretical foundations, empirical findings, and practical implications of scaling laws. We also explore key challenges, including data efficiency, inference scaling, and architecture-specific constraints, advocating for adaptive scaling strategies tailored to real-world applications. We suggest that while scaling laws provide a useful guide, they do not always generalize across all architectures and training strategies.
△ Less
Submitted 26 May, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
In-context denoising with one-layer transformers: connections between attention and associative memory retrieval
Authors:
Matthew Smart,
Alberto Bietti,
Anirvan M. Sengupta
Abstract:
We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attentio…
▽ More
We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.
△ Less
Submitted 6 June, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
Hybrid Hadronization -- A Study of In-Medium Hadronization of Jets
Authors:
A. Sengupta,
R. J. Fries,
M. Kordell II,
B. Kim,
A. Angerami,
R. Arora,
S. A. Bass,
Y. Chen,
R. Datta,
L. Du,
R. Ehlers,
H. Elfner,
C. Gale,
Y. He,
B. V. Jacak,
P. M. Jacobs,
S. Jeon,
Y. Ji,
F. Jonas,
L. Kasper,
A. Kumar,
R. Kunnawalkam-Elayavalli,
J. Latessa,
Y. -J. Lee,
R. Lemmon
, et al. (28 additional authors not shown)
Abstract:
QCD jets are considered important probes for quark gluon plasma created in collisions of nuclei at high energies. Their parton showers are significantly altered if they develop inside of a deconfined medium. Hadronization of jets is also thought to be affected by the presence of quarks and gluons. We present a systematic study of the effects of a thermal bath of partons on the hadronization of par…
▽ More
QCD jets are considered important probes for quark gluon plasma created in collisions of nuclei at high energies. Their parton showers are significantly altered if they develop inside of a deconfined medium. Hadronization of jets is also thought to be affected by the presence of quarks and gluons. We present a systematic study of the effects of a thermal bath of partons on the hadronization of parton showers. We use the JETSCAPE framework to create parton showers both in vacuum and in a brick of quark gluon plasma. The brick setup allows important parameters, like the size of the plasma as well as the collective flow of partons, to be varied systematically. We hadronize the parton showers using Hybrid Hadronization, which permits shower partons to form strings with thermal partons, or to recombine directly with thermal partons as well as with each other. We find a sizeable amount of interaction of shower partons with thermal partons during hadronization, indicating a natural continuation of the interaction of jet and medium during this stage. The observed effects grow with the size of the medium. Collective flow easily transfers from the thermal partons onto the emerging jet hadrons. We also see a significant change in hadron chemistry as expected in the presence of quark recombination processes.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
The Signals of the Doomsday
Authors:
Amartya Sengupta,
Dejan Stojkovic,
De-Chang Dai
Abstract:
The measured standard model parameters indicate that we might live in a false Higgs vacuum, possibly with a very long lifetime. However, small black holes can serve as catalysers and significantly speed up the phase transition. In fact, bubbles of true vacuum might already exist in our universe. We calculate the spectrum of Higgs particles produced by such a bubble, and use event generators to stu…
▽ More
The measured standard model parameters indicate that we might live in a false Higgs vacuum, possibly with a very long lifetime. However, small black holes can serve as catalysers and significantly speed up the phase transition. In fact, bubbles of true vacuum might already exist in our universe. We calculate the spectrum of Higgs particles produced by such a bubble, and use event generators to study their decay and subsequent evolution of the decay products to obtain the spectrum of emitted photons and neutrinos as a long-range signature. If the propagation of the bubble walls slows down due to interaction with the surrounding matter and plasma, these signals can reach us before the bubble wall hits us, thus representing the signals of the doomsday.
△ Less
Submitted 10 March, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Authors:
Ayan Sengupta,
Siddhant Chaudhary,
Tanmoy Chakraborty
Abstract:
The ever-increasing size of large language models (LLMs) presents significant challenges for deployment due to their heavy computational and memory requirements. Current model pruning techniques attempt to alleviate these issues by relying heavily on external calibration datasets to determine which parameters to prune or compress, thus limiting their flexibility and scalability across different co…
▽ More
The ever-increasing size of large language models (LLMs) presents significant challenges for deployment due to their heavy computational and memory requirements. Current model pruning techniques attempt to alleviate these issues by relying heavily on external calibration datasets to determine which parameters to prune or compress, thus limiting their flexibility and scalability across different compression ratios. Moreover, these methods often cause severe performance degradation, particularly in downstream tasks, when subjected to higher compression rates. In this paper, we propose PruneNet, a novel model compression method that addresses these limitations by reformulating model pruning as a policy learning process. PruneNet decouples the pruning process from the model architecture, eliminating the need for calibration datasets. It learns a stochastic pruning policy to assess parameter importance solely based on intrinsic model properties while preserving the spectral structure to minimize information loss. PruneNet can compress the LLaMA-2-7B model in just 15 minutes, achieving over 80% retention of its zero-shot performance with a 30% compression ratio, outperforming existing methods that retain only 75% performance. Furthermore, on complex multitask language understanding tasks, PruneNet demonstrates its robustness by preserving up to 80% performance of the original model, proving itself a superior alternative to conventional structured compression techniques.
△ Less
Submitted 28 February, 2025; v1 submitted 25 January, 2025;
originally announced January 2025.
-
Search for continuous gravitational waves from known pulsars in the first part of the fourth LIGO-Virgo-KAGRA observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi,
A. Al-Jodah,
C. Alléné
, et al. (1794 additional authors not shown)
Abstract:
Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent ana…
▽ More
Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent analysis methods considering the single-harmonic and the dual-harmonic emission models. We find no evidence of a CW signal in O4a data for both models and set upper limits on the signal amplitude and on the ellipticity, which quantifies the asymmetry in the neutron star mass distribution. For the single-harmonic emission model, 29 targets have the upper limit on the amplitude below the theoretical spin-down limit. The lowest upper limit on the amplitude is $6.4\!\times\!10^{-27}$ for the young energetic pulsar J0537-6910, while the lowest constraint on the ellipticity is $8.8\!\times\!10^{-9}$ for the bright nearby millisecond pulsar J0437-4715. Additionally, for a subset of 16 targets we performed a narrowband search that is more robust regarding the emission model, with no evidence of a signal. We also found no evidence of non-standard polarizations as predicted by the Brans-Dicke theory.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Hard Photon Triggered Jets in $p$-$p$ and $A$-$A$ Collisions
Authors:
C. Sirimanna,
Y. Tachibana,
A. Majumder,
A. Angerami,
R. Arora,
S. A. Bass,
Y. Chen,
R. Datta,
L. Du,
R. Ehlers,
H. Elfner,
R. J. Fries,
C. Gale,
Y. He,
B. V. Jacak,
P. M. Jacobs,
S. Jeon,
Y. Ji,
F. Jonas,
L. Kasper,
M. Kordell II,
A. Kumar,
R. Kunnawalkam-Elayavalli,
J. Latessa,
Y. -J. Lee
, et al. (27 additional authors not shown)
Abstract:
An investigation of high transverse momentum (high-$p_T$) photon triggered jets in proton-proton ($p$-$p$) and ion-ion ($A$-$A$) collisions at $\sqrt{s_{NN}} = 0.2$ and $5.02~\mathrm{TeV}$ is carried out, using the multistage description of in-medium jet evolution. Monte Carlo simulations of hard scattering and energy loss in heavy-ion collisions are performed using parameters tuned in a previous…
▽ More
An investigation of high transverse momentum (high-$p_T$) photon triggered jets in proton-proton ($p$-$p$) and ion-ion ($A$-$A$) collisions at $\sqrt{s_{NN}} = 0.2$ and $5.02~\mathrm{TeV}$ is carried out, using the multistage description of in-medium jet evolution. Monte Carlo simulations of hard scattering and energy loss in heavy-ion collisions are performed using parameters tuned in a previous study of the nuclear modification factor ($R_{AA}$) for inclusive jets and high-$p_T$ hadrons. We obtain a good reproduction of the experimental data for photon triggered jet $R_{AA}$, as measured by the ATLAS detector, the distribution of the ratio of jet to photon $p_T$ ($X_{\rm J γ}$), measured by both CMS and ATLAS, and the photon-jet azimuthal correlation as measured by CMS. We obtain a moderate description of the photon triggered jet $I_{AA}$, as measured by STAR. A noticeable improvement in the comparison is observed when one goes beyond prompt photons and includes bremsstrahlung and decay photons, revealing their significance in certain kinematic regions, particularly at $X_{Jγ} > 1$. Moreover, azimuthal angle correlations demonstrate a notable impact of non-prompt photons on the distribution, emphasizing their role in accurately describing experimental results. This work highlights the success of the multistage model of jet modification to straightforwardly predict (this set of) photon triggered jet observables. This comparison, along with the role played by non-prompt photons, has important consequences on the inclusion of such observables in a future Bayesian analysis.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
Deep Learning Based Superconductivity: Prediction and Experimental Tests
Authors:
Daniel Kaplan,
Adam Zhang,
Joanna Blawat,
Rongying Jin,
Robert J. Cava,
Viktor Oudovenko,
Gabriel Kotliar,
Anirvan M. Sengupta,
Weiwei Xie
Abstract:
The discovery of novel superconducting materials is a longstanding challenge in materials science, with a wealth of potential for applications in energy, transportation, and computing. Recent advances in artificial intelligence (AI) have enabled expediting the search for new materials by efficiently utilizing vast materials databases. In this study, we developed an approach based on deep learning…
▽ More
The discovery of novel superconducting materials is a longstanding challenge in materials science, with a wealth of potential for applications in energy, transportation, and computing. Recent advances in artificial intelligence (AI) have enabled expediting the search for new materials by efficiently utilizing vast materials databases. In this study, we developed an approach based on deep learning (DL) to predict new superconducting materials. We have synthesized a compound derived from our DL network and confirmed its superconducting properties in agreement with our prediction. Our approach is also compared to previous work based on random forests (RFs). In particular, RFs require knowledge of the chem-ical properties of the compound, while our neural net inputs depend solely on the chemical composition. With the help of hints from our network, we discover a new ternary compound $\textrm{Mo}_{20}\textrm{Re}_{6}\textrm{Si}_{4}$, which becomes superconducting below 5.4 K. We further discuss the existing limitations and challenges associated with using AI to predict and, along with potential future research directions.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
Learning interactions between Rydberg atoms
Authors:
Olivier Simard,
Anna Dawid,
Joseph Tindall,
Michel Ferrero,
Anirvan M. Sengupta,
Antoine Georges
Abstract:
Quantum simulators have the potential to solve quantum many-body problems that are beyond the reach of classical computers, especially when they feature long-range entanglement. To fulfill their prospects, quantum simulators must be fully controllable, allowing for precise tuning of the microscopic physical parameters that define their implementation. We consider Rydberg-atom arrays, a promising p…
▽ More
Quantum simulators have the potential to solve quantum many-body problems that are beyond the reach of classical computers, especially when they feature long-range entanglement. To fulfill their prospects, quantum simulators must be fully controllable, allowing for precise tuning of the microscopic physical parameters that define their implementation. We consider Rydberg-atom arrays, a promising platform for quantum simulations. Experimental control of such arrays is limited by the imprecision on the optical tweezers positions when assembling the array, hence introducing uncertainties in the simulated Hamiltonian. In this work, we introduce a scalable approach to Hamiltonian learning using graph neural networks (GNNs). We employ the Density Matrix Renormalization Group (DMRG) to generate ground-state snapshots of the transverse field Ising model realized by the array, for many realizations of the Hamiltonian parameters. Correlation functions reconstructed from these snapshots serve as input data to carry out the training. We demonstrate that our GNN model has a remarkable capacity to extrapolate beyond its training domain, both regarding the size and the shape of the system, yielding an accurate determination of the Hamiltonian parameters with a minimal set of measurements. We prove a theorem establishing a bijective correspondence between the correlation functions and the interaction parameters in the Hamiltonian, which provides a theoretical foundation to our learning algorithm. Our work could open the road to feedback control of the positions of the optical tweezers, hence providing a decisive improvement of analog quantum simulators.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Variational Dual Solutions of Chern-Simons Theory
Authors:
Amit Acharya,
Janusz Ginster,
Ambar N. Sengupta
Abstract:
A scheme for generating weakly lower semi-continuous action functionals corresponding to the Euler-Lagrange equations of Chern-Simons theory is described. Coercivity is deduced for such a functional in appropriate function spaces to prove the existence of a minimizer, which constitutes a solution to the Euler-Lagrange equations of Chern-Simons theory in a relaxed sense. A geometric analysis is als…
▽ More
A scheme for generating weakly lower semi-continuous action functionals corresponding to the Euler-Lagrange equations of Chern-Simons theory is described. Coercivity is deduced for such a functional in appropriate function spaces to prove the existence of a minimizer, which constitutes a solution to the Euler-Lagrange equations of Chern-Simons theory in a relaxed sense. A geometric analysis is also made, especially for the gauge group SU(2), relating connection forms on the bundle to corresponding forms in the dual scheme.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
Probing dark matter halo profiles with multi-band observations of gravitational waves
Authors:
Divya Tahelyani,
Arpan Bhattacharyya,
Anand S. Sengupta
Abstract:
In this paper, we evaluate the potential of multiband gravitational wave observations from a deci-Hz space-based detector and third-generation ground-based gravitational wave detectors to constrain the properties of dark matter spikes around intermediate-mass ratio inspirals. The presence of dark matter influences the orbital evolution of the secondary compact object through dynamic friction, whic…
▽ More
In this paper, we evaluate the potential of multiband gravitational wave observations from a deci-Hz space-based detector and third-generation ground-based gravitational wave detectors to constrain the properties of dark matter spikes around intermediate-mass ratio inspirals. The presence of dark matter influences the orbital evolution of the secondary compact object through dynamic friction, which leads to a phase shift in the gravitational waveform compared to the vacuum case. Our analysis shows that the proposed Indian space-based detector GWSat, operating in the deciHz frequency band, provides the most stringent constraints on the dark matter spike parameters, as IMRIs spend a significant portion of their inspiral phase within its sensitivity range. While third-generation ground-based detectors such as the Einstein Telescope and Cosmic Explorer offer additional constraints, their contribution is somewhat limited, particularly for higher-mass systems where the signal duration in their frequency bands is shorter. However, for systems with detector-frame total masses $M_z < 400 \rm M_{\odot}$, Cosmic Explorer and Einstein Telescope could improve the estimation of the chirp mass, symmetric mass ratio, luminosity distance, and dark matter spike power-law index by more than $15\%$. Nonetheless, their impact on the constraint of spike density is minimal. These results highlight the crucial role of deciHz space-based detectors in probing dark matter interactions with gravitational wave sources.
△ Less
Submitted 3 April, 2025; v1 submitted 21 November, 2024;
originally announced November 2024.
-
Comparing design and off-design aerodynamic performance of a natural laminar airfoil
Authors:
Aditi Sengupta,
Abhijeet Guha
Abstract:
Natural laminar flow airfoils are essential technologies designed to reduce drag and significantly enhance aerodynamic performance. A notable example is the SHM1 airfoil, created to meet the requirements of the small-business Honda jet. This airfoil has undergone extensive testing across various operational conditions, including low-speed wind tunnel tests and flight tests across a range of Reynol…
▽ More
Natural laminar flow airfoils are essential technologies designed to reduce drag and significantly enhance aerodynamic performance. A notable example is the SHM1 airfoil, created to meet the requirements of the small-business Honda jet. This airfoil has undergone extensive testing across various operational conditions, including low-speed wind tunnel tests and flight tests across a range of Reynolds numbers and free-stream Mach numbers, as detailed in "Natural-laminar-flow airfoil development for a lightweight business jet" by Fujino et al., J. Aircraft, 40(4), 2003. Additionally, investigations into drag-divergence behavior have been conducted using a transonic wind tunnel, with subsequent studies focusing on transonic shock boundary layer interactions through both experimental and numerical approaches. This study employs a series of numerical simulations to analyze the flow physics and aerodynamic performance across different free-stream Mach numbers in the subsonic and transonic regimes. This is achieved by examining computed instantaneous numerical Schlieren for various design conditions (such as low speed, climb, and cruise) and off-design scenarios (including transonic shock emergence, drag-divergence, and shock-induced separation). The dominant time scales, the time-averaged load distributions and boundary layer parameters are compared to provide a comprehensive overview of the SHM1's aerodynamics, establishing benchmark results for optimization of various flow separation and shock control techniques.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Effect of Gaussian wake amplitude on wake-induced transition for a T106A low pressure turbine cascade
Authors:
Aditi Sengupta
Abstract:
The wake-induced transition on the suction surface of a T106A low-pressure turbine (LPT) blade is investigated through a series of implicit large eddy simulations, solving the two-dimensional (2D) compressible Navier-Stokes equations (NSE). The impact of the incoming Gaussian wake amplitude on the blade's profile loss and associated boundary layer parameters is examined, revealing a 50\% reduction…
▽ More
The wake-induced transition on the suction surface of a T106A low-pressure turbine (LPT) blade is investigated through a series of implicit large eddy simulations, solving the two-dimensional (2D) compressible Navier-Stokes equations (NSE). The impact of the incoming Gaussian wake amplitude on the blade's profile loss and associated boundary layer parameters is examined, revealing a 50\% reduction in skin friction drag at the highest amplitude. The results indicate that increasing wake amplitude leads to delayed separation and earlier reattachment, resulting in reduced separated flow. The vorticity and enstrophy dynamics during the transition process under varying wake amplitudes reveal characteristic features of wake-induced transition, such as puffs, streaks, and turbulent spots. The periodic passing of wakes induces intermittent "calmed regions", which suppress flow separation and improve profile loss at low Reynolds numbers (Re), typically found in LPTs. The energy budget, accounting for both translational and rotational energy via the turbulent kinetic energy (TKE) and compressible enstrophy transport equation (CETE), respectively, shows trends with increasing wake amplitude. The relative contribution to TKE production and the roles of baroclinicity, compressibility, and viscous terms are explained.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
Authors:
Ayan Sengupta,
Vaibhav Seth,
Arinjay Pathak,
Natraj Raman,
Sriram Gopalakrishnan,
Tanmoy Chakraborty
Abstract:
Large Language Models (LLMs) are highly resource-intensive to fine-tune due to their enormous size. While low-rank adaptation is a prominent parameter-efficient fine-tuning approach, it suffers from sensitivity to hyperparameter choices, leading to instability in model performance on fine-tuning downstream tasks. This paper highlights the importance of effective parameterization in low-rank fine-t…
▽ More
Large Language Models (LLMs) are highly resource-intensive to fine-tune due to their enormous size. While low-rank adaptation is a prominent parameter-efficient fine-tuning approach, it suffers from sensitivity to hyperparameter choices, leading to instability in model performance on fine-tuning downstream tasks. This paper highlights the importance of effective parameterization in low-rank fine-tuning to reduce estimator variance and enhance the stability of final model outputs. We propose MonteCLoRA, an efficient fine-tuning technique, employing Monte Carlo estimation to learn an unbiased posterior estimation of low-rank parameters with low expected variance, which stabilizes fine-tuned LLMs with only O(1) additional parameters. MonteCLoRA shows significant improvements in accuracy and robustness, achieving up to 3.8% higher accuracy and 8.6% greater robustness than existing efficient fine-tuning methods on natural language understanding tasks with pre-trained RoBERTa-base. Furthermore, in generative tasks with pre-trained LLaMA-1-7B, MonteCLoRA demonstrates robust zero-shot performance with 50% lower variance than the contemporary efficient fine-tuning methods. The theoretical and empirical results presented in the paper underscore how parameterization and hyperpriors balance exploration-exploitation in the low-rank parametric space, therefore leading to more optimal and robust parameter estimation during efficient fine-tuning.
△ Less
Submitted 8 November, 2024; v1 submitted 6 November, 2024;
originally announced November 2024.
-
Tipping points in fitness landscape of heterogeneous populations
Authors:
Sumana Bhattacharyya,
Uttam Singh,
Anupam Sengupta
Abstract:
Predicting fitness of biologically-active populations, communities or systems in fluctuating environments is a long-standing challenge. Phenotypic plasticity and bet-hedging strategy, two key evolutionary traits living systems harness to optimize fitness in dynamic environments, have been widely reported yet how interplays therein could mediate fitness landscapes of heterogeneous populations remai…
▽ More
Predicting fitness of biologically-active populations, communities or systems in fluctuating environments is a long-standing challenge. Phenotypic plasticity and bet-hedging strategy, two key evolutionary traits living systems harness to optimize fitness in dynamic environments, have been widely reported yet how interplays therein could mediate fitness landscapes of heterogeneous populations remain unknown. Leveraging the financial asset pricing model, here we provide a dynamical framework for fitness of heterogeneous populations, underpinned by the interrelations between sub-populations exhibiting phenotypic plasticity and bet-hedgeding. Our framework, independent of the definition of fitness, employs a nonlinear difference equation to present fitness dynamics, and capture the emergence of tipping points, marking the onset of critical state transitions which lead to catastrophic shifts. This study identifies limits on the selective advantage conferred by bet-hedging through reduction in the temporal variance of fitness, with far-reaching ramifications on our current understanding of hedging-mediated fitness enhancement of a population. The lower bound of the effective fitness variance is set by a maximum number of bet-hedgers, beyond which the fitness landscape approaches critical transition, as confirmed by critical slowing down in the vicinity of tipping points. We estimate the scaling law for the critical slowing down numerically and derive the characteristic recovery time for heterogeneous populations. Taken together, our work provides a generic theoretical framework to quantify fitness dynamics and predict critical transitions in heterogeneous populations. The results can be extended further to model fitness landscapes of natural and synthetic multi-species consortia exposed to environmental fluctuations mimicking climatic shifts and immunopathological settings.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Search for gravitational waves emitted from SN 2023ixf
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi,
A. Al-Jodah,
C. Alléné,
A. Allocca
, et al. (1758 additional authors not shown)
Abstract:
We present the results of a search for gravitational-wave transients associated with core-collapse supernova SN 2023ixf, which was observed in the galaxy Messier 101 via optical emission on 2023 May 19th, during the LIGO-Virgo-KAGRA 15th Engineering Run. We define a five-day on-source window during which an accompanying gravitational-wave signal may have occurred. No gravitational waves have been…
▽ More
We present the results of a search for gravitational-wave transients associated with core-collapse supernova SN 2023ixf, which was observed in the galaxy Messier 101 via optical emission on 2023 May 19th, during the LIGO-Virgo-KAGRA 15th Engineering Run. We define a five-day on-source window during which an accompanying gravitational-wave signal may have occurred. No gravitational waves have been identified in data when at least two gravitational-wave observatories were operating, which covered $\sim 14\%$ of this five-day window. We report the search detection efficiency for various possible gravitational-wave emission models. Considering the distance to M101 (6.7 Mpc), we derive constraints on the gravitational-wave emission mechanism of core-collapse supernovae across a broad frequency spectrum, ranging from 50 Hz to 2 kHz where we assume the gravitational-wave emission occurred when coincident data are available in the on-source window. Considering an ellipsoid model for a rotating proto-neutron star, our search is sensitive to gravitational-wave energy $1 \times 10^{-4} M_{\odot} c^2$ and luminosity $2.6 \times 10^{-4} M_{\odot} c^2/s$ for a source emitting at 82 Hz. These constraints are around an order of magnitude more stringent than those obtained so far with gravitational-wave data. The constraint on the ellipticity of the proto-neutron star that is formed is as low as 1.08, at frequencies above 1200 Hz, surpassing past results.
△ Less
Submitted 11 March, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
A search using GEO600 for gravitational waves coincident with fast radio bursts from SGR 1935+2154
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi,
A. Al-Jodah,
C. Alléné
, et al. (1758 additional authors not shown)
Abstract:
The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by…
▽ More
The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by CHIME/FRB, as well as X-ray glitches and X-ray bursts detected by NICER and NuSTAR close to the time of one of the FRBs. We do not detect any significant GW emission from any of the events. Instead, using a short-duration GW search (for bursts $\leq$ 1 s) we derive 50\% (90\%) upper limits of $10^{48}$ ($10^{49}$) erg for GWs at 300 Hz and $10^{49}$ ($10^{50}$) erg at 2 kHz, and constrain the GW-to-radio energy ratio to $\leq 10^{14} - 10^{16}$. We also derive upper limits from a long-duration search for bursts with durations between 1 and 10 s. These represent the strictest upper limits on concurrent GW emission from FRBs.
△ Less
Submitted 21 May, 2025; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Accelerated parameter estimation of supermassive black hole binaries in LISA using a meshfree approximation
Authors:
Abhishek Sharma,
Anand S. Sengupta,
Suvodip Mukherjee
Abstract:
The Laser Interferometer Space Antenna (LISA) will be capable of detecting gravitational waves (GWs) in the milli-Hertz band. Among various sources, LISA will detect the coalescence of supermassive black hole binaries (SMBHBs). Accurate and rapid inference of parameters for such sources will be important for potential electromagnetic follow-up efforts. Rapid Bayesian inference with LISA includes a…
▽ More
The Laser Interferometer Space Antenna (LISA) will be capable of detecting gravitational waves (GWs) in the milli-Hertz band. Among various sources, LISA will detect the coalescence of supermassive black hole binaries (SMBHBs). Accurate and rapid inference of parameters for such sources will be important for potential electromagnetic follow-up efforts. Rapid Bayesian inference with LISA includes additional complexities as compared to current generation terrestrial detectors in terms of time and frequency dependent antenna response functions. In this work, we extend a recently developed, computationally efficient technique that uses meshfree interpolation methods to accelerate Bayesian reconstruction of compact binaries. Originally developed for second-generation terrestrial detectors, this technique is now adapted for LISA parameter estimation. Using the full inspiral, merger, and ringdown waveform (PhenomD) and assuming rigid adiabatic antenna response function, we show faithful inference of SMBHB parameters from GW signals embedded in stationary, Gaussian instrumental noise. We discuss the computational cost and performance of the meshfree approximation method in estimating the GW source parameters.
△ Less
Submitted 22 February, 2025; v1 submitted 21 September, 2024;
originally announced September 2024.
-
Harnessing AI data-driven global weather models for climate attribution: An analysis of the 2017 Oroville Dam extreme atmospheric river
Authors:
Jorge Baño-Medina,
Agniv Sengupta,
Allison Michaelis,
Luca Delle Monache,
Julie Kalansky,
Duncan Watson-Parris
Abstract:
AI data-driven models (Graphcast, Pangu Weather, Fourcastnet, and SFNO) are explored for storyline-based climate attribution due to their short inference times, which can accelerate the number of events studied, and provide real time attributions when public attention is heightened. The analysis is framed on the extreme atmospheric river episode of February 2017 that contributed to the Oroville da…
▽ More
AI data-driven models (Graphcast, Pangu Weather, Fourcastnet, and SFNO) are explored for storyline-based climate attribution due to their short inference times, which can accelerate the number of events studied, and provide real time attributions when public attention is heightened. The analysis is framed on the extreme atmospheric river episode of February 2017 that contributed to the Oroville dam spillway incident in Northern California. Past and future simulations are generated by perturbing the initial conditions with the pre-industrial and the late-21st century temperature climate change signals, respectively. The simulations are compared to results from a dynamical model which represents plausible pseudo-realities under both climate environments. Overall, the AI models show promising results, projecting a 5-6 % increase in the integrated water vapor over the Oroville dam in the present day compared to the pre-industrial, in agreement with the dynamical model. Different geopotential-moisture-temperature dependencies are unveiled for each of the AI-models tested, providing valuable information for understanding the physicality of the attribution response. However, the AI models tend to simulate weaker attribution values than the pseudo-reality imagined by the dynamical model, suggesting some reduced extrapolation skill, especially for the late-21st century regime. Large ensembles generated with an AI model (>500 members) produced statistically significant present-day to pre-industrial attribution results, unlike the >20-member ensemble from the dynamical model. This analysis highlights the potential of AI models to conduct attribution analysis, while emphasizing future lines of work on explainable artificial intelligence to gain confidence in these tools, which can enable reliable attribution studies in real-time.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models
Authors:
Aradhye Agarwal,
Suhas K Ramesh,
Ayan Sengupta,
Tanmoy Chakraborty
Abstract:
Fine-tuning large language models (LLMs) on downstream tasks requires substantial computational resources. Selective PEFT, a class of parameter-efficient fine-tuning (PEFT) methodologies, aims to mitigate these computational challenges by selectively fine-tuning only a small fraction of the model parameters. Although parameter-efficient, these techniques often fail to match the performance of full…
▽ More
Fine-tuning large language models (LLMs) on downstream tasks requires substantial computational resources. Selective PEFT, a class of parameter-efficient fine-tuning (PEFT) methodologies, aims to mitigate these computational challenges by selectively fine-tuning only a small fraction of the model parameters. Although parameter-efficient, these techniques often fail to match the performance of fully fine-tuned models, primarily due to inherent biases introduced during parameter selection. Traditional selective PEFT techniques use a fixed set of parameters selected using different importance heuristics, failing to capture parameter importance dynamically and often leading to suboptimal performance. We introduce $\text{ID}^3$, a novel selective PEFT method that calculates parameter importance continually, and dynamically unmasks parameters by balancing exploration and exploitation in parameter selection. Our empirical study on 16 tasks spanning natural language understanding, mathematical reasoning and summarization demonstrates the effectiveness of our method compared to fixed-masking selective PEFT techniques. We analytically show that $\text{ID}^3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency. Since $\text{ID}^3$ is robust to random initialization of neurons and operates directly on the optimization process, it is highly flexible and can be integrated with existing additive and reparametrization-based PEFT techniques such as adapters and LoRA respectively.
△ Less
Submitted 23 June, 2025; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Bayesian Inference analysis of jet quenching using inclusive jet and hadron suppression measurements
Authors:
R. Ehlers,
Y. Chen,
J. Mulligan,
Y. Ji,
A. Kumar,
S. Mak,
P. M. Jacobs,
A. Majumder,
A. Angerami,
R. Arora,
S. A. Bass,
R. Datta,
L. Du,
H. Elfner,
R. J. Fries,
C. Gale,
Y. He,
B. V. Jacak,
S. Jeon,
F. Jonas,
L. Kasper,
M. Kordell II,
R. Kunnawalkam-Elayavalli,
J. Latessa,
Y. -J. Lee
, et al. (28 additional authors not shown)
Abstract:
The JETSCAPE Collaboration reports a new determination of the jet transport parameter $\hat{q}$ in the Quark-Gluon Plasma (QGP) using Bayesian Inference, incorporating all available inclusive hadron and jet yield suppression data measured in heavy-ion collisions at RHIC and the LHC. This multi-observable analysis extends the previously published JETSCAPE Bayesian Inference determination of…
▽ More
The JETSCAPE Collaboration reports a new determination of the jet transport parameter $\hat{q}$ in the Quark-Gluon Plasma (QGP) using Bayesian Inference, incorporating all available inclusive hadron and jet yield suppression data measured in heavy-ion collisions at RHIC and the LHC. This multi-observable analysis extends the previously published JETSCAPE Bayesian Inference determination of $\hat{q}$, which was based solely on a selection of inclusive hadron suppression data. JETSCAPE is a modular framework incorporating detailed dynamical models of QGP formation and evolution, and jet propagation and interaction in the QGP. Virtuality-dependent partonic energy loss in the QGP is modeled as a thermalized weakly-coupled plasma, with parameters determined from Bayesian calibration using soft-sector observables. This Bayesian calibration of $\hat{q}$ utilizes Active Learning, a machine--learning approach, for efficient exploitation of computing resources. The experimental data included in this analysis span a broad range in collision energy and centrality, and in transverse momentum. In order to explore the systematic dependence of the extracted parameter posterior distributions, several different calibrations are reported, based on combined jet and hadron data; on jet or hadron data separately; and on restricted kinematic or centrality ranges of the jet and hadron data. Tension is observed in comparison of these variations, providing new insights into the physics of jet transport in the QGP and its theoretical formulation.
△ Less
Submitted 28 August, 2024; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Substrate stiffness modulates bacterial adhesion and diversity of adherent phenotypes across growth stages
Authors:
René Riedel,
Garima Rani,
Anupam Sengupta
Abstract:
Surface-adhesion and stiffness of underlying substrates mediate geometry, mechanics and self-organization of bacterial colonies. Recent studies have qualitatively indicted that stiffness may impact bacterial attachment, yet the variation of cell-to-surface adhesion with substrate stiffness remains to be quantified. Here, by developing a cell-level Force Distance Spectroscopy (FDS) technique based…
▽ More
Surface-adhesion and stiffness of underlying substrates mediate geometry, mechanics and self-organization of bacterial colonies. Recent studies have qualitatively indicted that stiffness may impact bacterial attachment, yet the variation of cell-to-surface adhesion with substrate stiffness remains to be quantified. Here, by developing a cell-level Force Distance Spectroscopy (FDS) technique based on Atomic Force Microscopy (AFM), we simultaneously quantify the cell-surface adhesion alongside stiffness of the underlying substrates to reveal stiffness-dependent adhesion in phototrophic bacterium Chromatium okenii. As stiffness of the soft substrate, modelled via low-melting-point (LMP) agarose pad, was varied between 20 kPa and 120 kPa by changing agarose concentrations, we observe a progressive increase of the mean adhesion force by over an order of magnitude, from 0.21 (+/-0.10) nN to 2.42 (+/-1.16) nN. In contrast, passive polystyrene (PS) microparticles of comparable dimensions showed no perceptible change in their surface adhesion. Furthermore, for Escherichia coli, the cell-surface adhesion varied between 0.29 (+/-0.17) nN to 0.39 (+/-0.20) nN, showing a weak dependence on the substrate stiffness, thus suggesting that the stiffness-modulated adhesion is a species-specific trait. Finally, by quantifying the adhesion of C. okenii populations across growth stages, we report an emergent co-existence of weak and strongly adherent sub-populations, demonstrating a diversification of adherent phenotypes over time. Taken together, these findings suggest that bacteria, depending on the species and their physiological stage, actively modulate cell-to-surface adhesion in response to substrate stiffness, and leverage it as a functional trait to modulate initial attachment and colonization on soft substrates during early stages of biofilm development.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
A soft-hard framework with exact four momentum conservation for small systems
Authors:
I. Soudi,
W. Zhao,
A. Majumder,
C. Shen,
J. H. Putschke,
B. Boudreaux,
A. Angerami,
R. Arora,
S. A. Bass,
Y. Chen,
R. Datta,
L. Du,
R. Ehlers,
H. Elfner,
R. J. Fries,
C. Gale,
Y. He,
B. V. Jacak,
P. M. Jacobs,
S. Jeon,
Y. Ji,
L. Kasper,
M. Kelsey,
M. Kordell II,
A. Kumar
, et al. (28 additional authors not shown)
Abstract:
A new framework, called x-scape, for the combined study of both hard and soft transverse momentum sectors in high energy proton-proton ($p$-$p$) and proton-nucleus ($p$-$A$) collisions is set up. A dynamical initial state is set up using the 3d-Glauber model with transverse locations of hotspots within each incoming nucleon. A hard scattering that emanates from two colliding hotspots is carried ou…
▽ More
A new framework, called x-scape, for the combined study of both hard and soft transverse momentum sectors in high energy proton-proton ($p$-$p$) and proton-nucleus ($p$-$A$) collisions is set up. A dynamical initial state is set up using the 3d-Glauber model with transverse locations of hotspots within each incoming nucleon. A hard scattering that emanates from two colliding hotspots is carried out using the Pythia generator. Initial state radiation from the incoming hard partons is carried out in a new module called I-matter, which includes the longitudinal location of initial splits. The energy-momentum of both the initial hard partons and their associated beam remnants is removed from the hot spots, depleting the energy-momentum available for the formation of the bulk medium. Outgoing showers are simulated using the matter generator, and results are presented for both cases, allowing for and not allowing for energy loss. First comparisons between this hard-soft model and single inclusive hadron and jet data from $p$-$p$ and minimum bias $p$-$Pb$ collisions are presented. Single hadron spectra in $p$-$p$ are used to carry out a limited (in number of parameters) Bayesian calibration of the model. Fair comparisons with data are indicative of the utility of this new framework. Theoretical studies of the correlation between jet $p_T$ and event activity at mid and forward rapidity are carried out.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Tadpole conjecture in non-geometric backgrounds
Authors:
Katrin Becker,
Nathan Brady,
Mariana Graña,
Miguel Morros,
Anindya Sengupta,
Qi You
Abstract:
Calabi-Yau compactifications have typically a large number of complex structure and/or Kähler moduli that have to be stabilised in phenomenologically-relevant vacua. The former can in principle be done by fluxes in type IIB solutions. However, the tadpole conjecture proposes that the number of stabilised moduli can at most grow linearly with the tadpole charge of the fluxes required for stabilisat…
▽ More
Calabi-Yau compactifications have typically a large number of complex structure and/or Kähler moduli that have to be stabilised in phenomenologically-relevant vacua. The former can in principle be done by fluxes in type IIB solutions. However, the tadpole conjecture proposes that the number of stabilised moduli can at most grow linearly with the tadpole charge of the fluxes required for stabilisation. We scrutinise this conjecture in the $2^6$ Gepner model: a non-geometric background mirror dual to a rigid Calabi-Yau manifold, in the deep interior of moduli space. By constructing an extensive set of supersymmetric Minkowski flux solutions, we spectacularly confirm the linear growth, while achieving a slightly higher ratio of stabilised moduli to flux charge than the conjectured upper bound. As a byproduct, we obtain for the first time a set of solutions within the tadpole bound where all complex structure moduli are massive. Since the $2^6$ model has no Kähler moduli, these show that the massless Minkowski conjecture does not hold beyond supergravity.
△ Less
Submitted 23 May, 2025; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Fully stabilized Minkowski vacua in the $2^6$ Landau-Ginzburg model
Authors:
Muthusamy Rajaguru,
Anindya Sengupta,
Timm Wrase
Abstract:
We study moduli stabilization via fluxes in the $2^6$ Landau-Ginzburg model. Fluxes not only give masses to scalar fields but can also induce higher order couplings that stabilize massless fields. We investigate this for several different flux choices in the $2^6$ model and find two examples that are inconsistent with the Refined Tadpole Conjecture. We also present, to our knowledge, the first 4d…
▽ More
We study moduli stabilization via fluxes in the $2^6$ Landau-Ginzburg model. Fluxes not only give masses to scalar fields but can also induce higher order couplings that stabilize massless fields. We investigate this for several different flux choices in the $2^6$ model and find two examples that are inconsistent with the Refined Tadpole Conjecture. We also present, to our knowledge, the first 4d $\mathcal{N}=1$ Minkowski solution in string theory without any flat direction.
△ Less
Submitted 26 May, 2025; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Fast Scrambling at the Boundary
Authors:
Ancel Larzul,
Anirvan M. Sengupta,
Antoine Georges,
Marco Schirò
Abstract:
Many-body systems which saturate the quantum bound on chaos are attracting interest across a wide range of fields. Notable examples include the Sachdev-Ye-Kitaev model and its variations, all characterised by some form or randomness and all to all couplings. Here we study many-body quantum chaos in a quantum impurity model showing Non-Fermi-Liquid physics, the overscreened multichannel $SU(N)$ Kon…
▽ More
Many-body systems which saturate the quantum bound on chaos are attracting interest across a wide range of fields. Notable examples include the Sachdev-Ye-Kitaev model and its variations, all characterised by some form or randomness and all to all couplings. Here we study many-body quantum chaos in a quantum impurity model showing Non-Fermi-Liquid physics, the overscreened multichannel $SU(N)$ Kondo model. We compute exactly the low-temperature behavior of the out-of time order correlator in the limit of large $N$ and large number of channels $K$, at fixed ratio $γ=K/N$. Due to strong correlations at the impurity site the spin fractionalizes in auxiliary fermions and bosons. We show that all the degrees of freedom of our theory acquire a Lyapunov exponent which is linear in temperature as $T\rightarrow 0$, with a prefactor that depends on $γ$. Remarkably, for $N=K$ the impurity spin displays maximal chaos, while bosons and fermions only get up to half of the maximal Lyapunov exponent. Our results highlights two new features: a non-disordered model which is maximally chaotic due to strong correlations at its boundary and a fractionalization of quantum chaos.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.