-
pared: Model selection using multi-objective optimization
Authors:
Priyam Das,
Sarah Robinson,
Christine B. Peterson
Abstract:
Motivation: Model selection is a ubiquitous challenge in statistics. For penalized models, model selection typically entails tuning hyperparameters to maximize a measure of fit or minimize out-of-sample prediction error. However, these criteria fail to reflect other desirable characteristics, such as model sparsity, interpretability, or smoothness. Results: We present the R package pared to enable…
▽ More
Motivation: Model selection is a ubiquitous challenge in statistics. For penalized models, model selection typically entails tuning hyperparameters to maximize a measure of fit or minimize out-of-sample prediction error. However, these criteria fail to reflect other desirable characteristics, such as model sparsity, interpretability, or smoothness. Results: We present the R package pared to enable the use of multi-objective optimization for model selection. Our approach entails the use of Gaussian process-based optimization to efficiently identify solutions that represent desirable trade-offs. Our implementation includes popular models with multiple objectives including the elastic net, fused lasso, fused graphical lasso, and group graphical lasso. Our R package generates interactive graphics that allow the user to identify hyperparameter values that result in fitted models which lie on the Pareto frontier. Availability: We provide the R package pared and vignettes illustrating its application to both simulated and real data at https://github.com/priyamdas2/pared.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
A detailed study on ergodic to non-ergodic phase transition in the dissipative anisotropic Dicke model
Authors:
Pragna Das,
Saptarshi Saha
Abstract:
Ergodic-non-ergodic phase transition is one of the paramount features of the anisotropic Dicke model. Here, we have thoroughly examined the effect of the dissipation by analyzing both the eigenvalue and eigenvector properties of the Liouvillian with the aid of the scaling of Liouvilian gap, and the average participation ratio. We show that the properties of the eigenvectors of Liouvillian are cons…
▽ More
Ergodic-non-ergodic phase transition is one of the paramount features of the anisotropic Dicke model. Here, we have thoroughly examined the effect of the dissipation by analyzing both the eigenvalue and eigenvector properties of the Liouvillian with the aid of the scaling of Liouvilian gap, and the average participation ratio. We show that the properties of the eigenvectors of Liouvillian are consistent with those of the eigenvalues, revealing a phase diagram, which has similarities to the non-ergodic to ergodic transition in the closed undriven system. We also uncover that the Liouvillian gap is independent of system size in the non-ergodic phase whereas in the ergodic phase, it scales with the atom number as: $N^{-z}$ where $0<z<1$. Moreover, we extend our analysis to the driven case where a Thue-Morse quasi-periodic sequence is applied and observe that the boson dissipation plays a pivotal role in stabilizing the prethermal plateau. Our investigation indicates that a non-ergodic phase is more favorable than the ergodic phase in the presence of bosonic dissipation.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Optimal Intervention for Self-triggering Spatial Networks with Application to Urban Crime Analytics
Authors:
Pramit Das,
Moulinath Banerjee,
Yuekai Sun
Abstract:
In many network systems, events at one node trigger further activity at other nodes, e.g., social media users reacting to each other's posts or the clustering of criminal activity in urban environments. These systems are typically referred to as self-exciting networks. In such systems, targeted intervention at critical nodes can be an effective strategy for mitigating undesirable consequences such…
▽ More
In many network systems, events at one node trigger further activity at other nodes, e.g., social media users reacting to each other's posts or the clustering of criminal activity in urban environments. These systems are typically referred to as self-exciting networks. In such systems, targeted intervention at critical nodes can be an effective strategy for mitigating undesirable consequences such as further propagation of criminal activity or the spreading of misinformation on social media. In our work, we develop an optimal network intervention model to explore how targeted interventions at critical nodes can mitigate cascading effects throughout a Spatiotemporal Hawkes network. Similar models have been studied previously in the literature in purely temporal Hawkes networks, but in our work, we extend them to a spatiotemporal setup and demonstrate the efficacy of our methods by comparing the post-intervention reduction in intensity to other heuristic strategies in simulated networks. Subsequently, we use our method on crime data from the LA police department database to find neighborhoods for strategic intervention to demonstrate an application in predictive policing.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Ultraslow Growth of Domains in a Random-Field System With Correlated Disorder
Authors:
Subhanker Howlader,
Prasenjit Das,
Manoj Kumar
Abstract:
We study domain growth kinetics in a random-field system in the presence of a spatially correlated disorder $h_{i}(\vec r)$ after an instantaneous quench at a finite temperature $T$ from a random initial state corresponding to $T=\infty$. The correlated disorder field $h_{i}(\vec r)$ arises due to the presence of magnetic impurities, decaying spatially in a power-law fashion. We use Glauber spin-f…
▽ More
We study domain growth kinetics in a random-field system in the presence of a spatially correlated disorder $h_{i}(\vec r)$ after an instantaneous quench at a finite temperature $T$ from a random initial state corresponding to $T=\infty$. The correlated disorder field $h_{i}(\vec r)$ arises due to the presence of magnetic impurities, decaying spatially in a power-law fashion. We use Glauber spin-flip dynamics to simulate the kinetics at the microscopic level. The system evolves via the formation of ordered magnetic domains. We characterize the morphology of domains using the equal-time correlation function $C(r,t)$ and structure factor $S(k,t)$. In the large-$k$ limit, $S(k, t)$ obeys Porod's law: $S(k, t)\sim k^{-(d+1)}$. The average domain size $L(t)$ asymptotically follows \textit{double logarithmic growth behavior}.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Exploring unconventional superconductivity in PdTe via Point Contact Spectroscopy
Authors:
Pritam Das,
Sulagna Dutta,
Saurav Suman,
Amit Vashist,
Bibek Ranjan Satapathy,
John Jesudasan,
Suvankar Chakraverty,
Rajdeep Sensarma,
Pratap Raychaudhuri
Abstract:
Palladium Telluride (PdTe), a non-layered intermetallic crystalline compound, has captured attention for its unique superconducting properties and strong spin-orbit coupling. In this work, we investigate the superconducting state of PdTe using point-contact Andreev reflection (PCAR) spectroscopy. The experimental data are analyzed using the Blonder-Tinkham-Klapwijk (BTK) model for s, p and d wave…
▽ More
Palladium Telluride (PdTe), a non-layered intermetallic crystalline compound, has captured attention for its unique superconducting properties and strong spin-orbit coupling. In this work, we investigate the superconducting state of PdTe using point-contact Andreev reflection (PCAR) spectroscopy. The experimental data are analyzed using the Blonder-Tinkham-Klapwijk (BTK) model for s, p and d wave symmetries. Our results reveal clear evidence of unconventional superconductivity. The superconducting gap showing features consistent with either p-wave or d-wave pairing symmetries but cannot be fitted with s-wave symmetry. The observed anisotropic gap structure and deviations from conventional BCS behaviour highlight the complex nature of the pairing interactions in PdTe. These findings provide strong evidence of unconventional pairing symmetry in this material.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Certain Observations on Ideals Associated With Weighted Density Using Modulus Functions
Authors:
Pratulananda Das,
Subhankar Das
Abstract:
In this article our main object of investigation is the simple modular density ideals $\mathcal{Z}_g(f)$ introduced in [Bose et al., Indag. math., 2018] where $g$ is a weight function, more precisely, $g\in G$, $G=\{g:ω\to [0,\infty):\frac{k}{g(k)}\not\to 0 \text{ and }\:\: g(k)\to \infty \text{ as }\:\:k\to \infty \}$ and $f$ is an unbounded modulus function. We mainly investigate certain propert…
▽ More
In this article our main object of investigation is the simple modular density ideals $\mathcal{Z}_g(f)$ introduced in [Bose et al., Indag. math., 2018] where $g$ is a weight function, more precisely, $g\in G$, $G=\{g:ω\to [0,\infty):\frac{k}{g(k)}\not\to 0 \text{ and }\:\: g(k)\to \infty \text{ as }\:\:k\to \infty \}$ and $f$ is an unbounded modulus function. We mainly investigate certain properties of these ideals in line of [Kwela et al, J. math. Anal. Appl., 2019]. For an unbounded modulus function $f$ it is shown that there are $1$ or $\ck$ many functions $g\in G$ generating the same ideal $\mathcal{Z}_g(f)$. We then obtain certain interactive results involving the sequence of submeasures $\{φ_k\}_{k\in ω}$ generating the ideal $\mathcal{Z}_g(f)$ and the functions $g,f$. Finally, we present some observations on $\mathcal{Z}_g(f)$ ideals related to the notion of increasing-invariance.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Social Biases in Knowledge Representations of Wikidata separates Global North from Global South
Authors:
Paramita Das,
Sai Keerthana Karnam,
Aditya Soni,
Animesh Mukherjee
Abstract:
Knowledge Graphs have become increasingly popular due to their wide usage in various downstream applications, including information retrieval, chatbot development, language model construction, and many others. Link prediction (LP) is a crucial downstream task for knowledge graphs, as it helps to address the problem of the incompleteness of the knowledge graphs. However, previous research has shown…
▽ More
Knowledge Graphs have become increasingly popular due to their wide usage in various downstream applications, including information retrieval, chatbot development, language model construction, and many others. Link prediction (LP) is a crucial downstream task for knowledge graphs, as it helps to address the problem of the incompleteness of the knowledge graphs. However, previous research has shown that knowledge graphs, often created in a (semi) automatic manner, are not free from social biases. These biases can have harmful effects on downstream applications, especially by leading to unfair behavior toward minority groups. To understand this issue in detail, we develop a framework -- AuditLP -- deploying fairness metrics to identify biased outcomes in LP, specifically how occupations are classified as either male or female-dominated based on gender as a sensitive attribute. We have experimented with the sensitive attribute of age and observed that occupations are categorized as young-biased, old-biased, and age-neutral. We conduct our experiments on a large number of knowledge triples that belong to 21 different geographies extracted from the open-sourced knowledge graph, Wikidata. Our study shows that the variance in the biased outcomes across geographies neatly mirrors the socio-economic and cultural division of the world, resulting in a transparent partition of the Global North from the Global South.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Fractional $p$-Laplace systems with critical Hardy nonlinearities: Existence and Multiplicity
Authors:
Nirjan Biswas,
Paramananda Das,
Shilpa Gupta
Abstract:
Let $Ω\subset \mathbb{R}^d$ be a bounded open set containing zero, $s \in (0,1)$ and $p \in (1, \infty)$. In this paper, we first deal with the existence, non-existence and some properties of ground-state solutions for the following class of fractional $p$-Laplace systems \begin{equation*} \left\{\begin{aligned} &(-Δ_p)^s u= \fracα{q} \frac{|u|^{α-2}u|v|^β}{|x|^m} \;\;\text{in}\;Ω,\\ &(-Δ_p)^s v=…
▽ More
Let $Ω\subset \mathbb{R}^d$ be a bounded open set containing zero, $s \in (0,1)$ and $p \in (1, \infty)$. In this paper, we first deal with the existence, non-existence and some properties of ground-state solutions for the following class of fractional $p$-Laplace systems \begin{equation*} \left\{\begin{aligned} &(-Δ_p)^s u= \fracα{q} \frac{|u|^{α-2}u|v|^β}{|x|^m} \;\;\text{in}\;Ω,\\ &(-Δ_p)^s v= \fracβ{q} \frac{|v|^{β-2}v|u|^α}{|x|^m}\;\;\text{in}\;Ω,\\ &u=v=0\, \mbox{ in }\mathbb{R}^d\setminus Ω, \end{aligned} \right. \end{equation*} where $d>sp$, $α+ β= q$ where $p \leq q \leq p_{s}^{*}(m)$ where $p_{s}^{*}(m) = \frac{p(d-m)}{d-sp}$ with $0 \leq m \le sp$. Additionally, we establish a concentration-compactness principle related to this homogeneous system of equations. Next, the main objective of this paper is to study the following non-homogenous system of equations \begin{equation*} \left\{\begin{aligned} &(-Δ_p)^s u = η|u|^{r-2}u + γ\fracα{p_{s}^{*}(m)} \frac{|u|^{α-2}u|v|^β}{|x|^m} \;\;\text{in}\;Ω,\\ &(-Δ_p)^s v = η|v|^{r-2}v + γ\fracβ{p^{*}_{s}(m)} \frac{|v|^{β-2}v|u|^α}{|x|^m}\;\;\text{in}\;Ω,\\ &u=v=0\, \mbox{ in }\mathbb{R}^d\setminus Ω, \end{aligned} \right. \end{equation*} where $η, γ> 0$ are parameters and $p \leq r < p_{s}^{*}(0)$. Depending on the values of $η, γ$, we obtain the existence of a non semi-trivial solution with the least energy. Further, for $m=0$, we establish that the above problem admits at least $\text{cat}_Ω(Ω)$ nontrivial solutions.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Pathwise Itô isometry for scaled quadratic variation
Authors:
Suprio Bhar,
Purba Das,
Barun Sarkar
Abstract:
The concept of scaled quadratic variation was originally introduced by E. Gladyshev in 1961 in the context of Gaussian processes, where it was defined as the limit of the covariance of the underlying Gaussian process. In this paper, we extend this notion beyond the Gaussian framework for any real-valued continuous function by formulating it in a pathwise manner along a given sequence of partitions…
▽ More
The concept of scaled quadratic variation was originally introduced by E. Gladyshev in 1961 in the context of Gaussian processes, where it was defined as the limit of the covariance of the underlying Gaussian process. In this paper, we extend this notion beyond the Gaussian framework for any real-valued continuous function by formulating it in a pathwise manner along a given sequence of partitions. We demonstrate that, for classical Gaussian processes such as fractional Brownian motion, this pathwise definition coincides with the traditional one up to a constant factor. Furthermore, we establish that the scaled quadratic variation is invariant under smooth transformations and satisfies a pathwise Itô isometry-type result, derived without relying on any expectation arguments
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Quasilinear problems with mixed local-nonlocal operator and concave-critical nonlinearities: Multiplicity of positive solutions
Authors:
Mousomi Bhakta,
Nirjan Biswas,
Paramananda Das
Abstract:
We study the existence and multiplicity of positive solutions for the following concave-critical problem driven by an operator of mixed order obtained by the sum of the classical $p$-Laplacian and of the fractional $p$-Laplacian, \begin{equation}\tag{$\mathcal{P}_{λ,\varepsilon}$}
-Δ_p u+\varepsilon(-Δ_p)^s u=λ|u|^{q-2}u+|u|^{p^*-2}u \;\text{ in }Ω,\quad
u=0 \; \text{ in }\mathbb{R}^N \setminu…
▽ More
We study the existence and multiplicity of positive solutions for the following concave-critical problem driven by an operator of mixed order obtained by the sum of the classical $p$-Laplacian and of the fractional $p$-Laplacian, \begin{equation}\tag{$\mathcal{P}_{λ,\varepsilon}$}
-Δ_p u+\varepsilon(-Δ_p)^s u=λ|u|^{q-2}u+|u|^{p^*-2}u \;\text{ in }Ω,\quad
u=0 \; \text{ in }\mathbb{R}^N \setminus Ω, \end{equation} where $Ω\subset\mathbb{R}^N$ is a bounded open set, $ε\in(0,1]$, $0<s<1<q<p<N$, and $p^*=\frac{Np}{N-p}$, and $λ\in \mathbb{R}$ is a parameter. For $λ\leq 0$, we show that (\textcolor{blue}{$\mathcal{P}_{λ,\varepsilon}$}) has no nontrivial solution. For $λ>0$, we prove Ambrosetti-Brezis-Cerami type results. In particular, we prove the existence of $Λ_\varepsilon$ such that (\textcolor{blue}{$\mathcal{P}_{λ,\varepsilon}$}) has a positive minimal solution for $0<λ<Λ_\varepsilon$, a positive solution for $λ=Λ_\varepsilon$ and no positive solution for $λ>Λ_\varepsilon$. We also prove the existence of $0<λ^\#\leqΛ_\varepsilon$ such that (\textcolor{blue}{$\mathcal{P}_{λ,\varepsilon}$}) has at least two positive solutions for $λ\in(0,λ^\#)$ provided $\varepsilon$ small enough. This extends the recent result of Biagi and Vecchi (Nonlinear Anal. 256 (2025),113795), Amundsen, et al. (Commun. Pure Appl. Anal., 22(10):3139-3164, 2023) from $p=2$ to the general $1<p<N$. Additionally, it extends the classical result of Azorero and Peral (Indiana Univ. Math. J., 43(3):947-957, 1994) to the mixed local-nonlocal quasilinear problems. Moreover, our results complements the multiplicity results for nonnegative solutions in da Silva, et al. (J. Differential Equations, 408:494-536, 2024).
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Phase Separation in Active Binary Mixtures With Chemical Reaction
Authors:
Sayantan Mondal,
Prasenjit Das
Abstract:
We study motility-induced phase separation~(MIPS) in active AB binary mixtures undergoing the chemical reaction $A \rightleftharpoons B$. Starting from the evolution equations for the density fields $ρ_i(\vec r, t)$ describing MIPS, we phenomenologically incorporate the effects of the reaction through the reaction rate $Γ$ into the equations. The steady-state domain morphologies depend on $Γ$ and…
▽ More
We study motility-induced phase separation~(MIPS) in active AB binary mixtures undergoing the chemical reaction $A \rightleftharpoons B$. Starting from the evolution equations for the density fields $ρ_i(\vec r, t)$ describing MIPS, we phenomenologically incorporate the effects of the reaction through the reaction rate $Γ$ into the equations. The steady-state domain morphologies depend on $Γ$ and the relative activity of the species, $Δ$. For a sufficiently large $Γ$ and $Δ\ne 1$, the more active component of the mixture forms a droplet morphology. We characterize the morphology of domains by calculating the equal-time correlation function $C(r, t)$ and the structure factor $S(k, t)$, exhibiting scaling violation. The average domain size, $L(t)$, follows a diffusive growth as $L(t)\sim t^{1/3}$ before reaching the steady state domain size, $L_{\rm ss}$. Additionally, $L_{\rm ss}$ shows the scaling relation $L_{\rm ss}\simΓ^{-1/4}$, independent of $Δ$.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Velocity Distribution and Diffusion of an Athermal Inertial Run-and-Tumble Particle in a Shear-Thinning Medium
Authors:
Sayantan Mondal,
Prasenjit Das
Abstract:
We study the dynamics of an athermal inertial active particle moving in a shear-thinning medium in $d=1$. The viscosity of the medium is modeled using a Coulomb-tanh function, while the activity is represented by an asymmetric dichotomous noise with strengths $-Δ$ and $μΔ$, transitioning between these states at a rate $λ$. Starting from the Fokker-Planck~(FP) equation for the time-dependent probab…
▽ More
We study the dynamics of an athermal inertial active particle moving in a shear-thinning medium in $d=1$. The viscosity of the medium is modeled using a Coulomb-tanh function, while the activity is represented by an asymmetric dichotomous noise with strengths $-Δ$ and $μΔ$, transitioning between these states at a rate $λ$. Starting from the Fokker-Planck~(FP) equation for the time-dependent probability distributions $P(v,-Δ,t)$ and $P(v,μΔ,t)$ of the particle's velocity $v$ at time $t$, moving under the influence of active forces $-Δ$ and $μΔ$ respectively, we analytically derive the steady-state velocity distribution function $P_s(v)$, explicitly dependent on $μ$. Also, we obtain a quadrature expression for the effective diffusion coefficient $D_e$ for the symmetric active force case~($μ=1$). For a given $Δ$ and $μ$, we show that $P_s(v)$ exhibits multiple transitions as $λ$ is varied. Subsequently, we numerically compute $P_s(v)$, the mean-squared velocity $\langle v^2\rangle(t)$, and the diffusion coefficient $D_e$ by solving the particle's equation of motion, all of which show excellent agreement with the analytical results in the steady-state. Finally, we examine the universal nature of the transitions in $P_s(v)$ by considering an alternative functional form of medium's viscosity that also capture the shear-thinning behavior.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Authors:
Avinash Kumar,
Shashank Nag,
Jason Clemons,
Lizy John,
Poulami Das
Abstract:
Deploying large language models (LLMs) presents critical challenges due to the inherent trade-offs associated with key performance metrics, such as latency, accuracy, and throughput. Typically, gains in one metric is accompanied with degradation in others. Early-Exit LLMs (EE-LLMs) efficiently navigate this trade-off space by skipping some of the later model layers when it confidently finds an out…
▽ More
Deploying large language models (LLMs) presents critical challenges due to the inherent trade-offs associated with key performance metrics, such as latency, accuracy, and throughput. Typically, gains in one metric is accompanied with degradation in others. Early-Exit LLMs (EE-LLMs) efficiently navigate this trade-off space by skipping some of the later model layers when it confidently finds an output token early, thus reducing latency without impacting accuracy. However, as the early exits taken depend on the task and are unknown apriori to request processing, EE-LLMs conservatively load the entire model, limiting resource savings and throughput. Also, current frameworks statically select a model for a user task, limiting our ability to adapt to changing nature of the input queries.
We propose HELIOS to address these challenges. First, HELIOS shortlists a set of candidate LLMs, evaluates them using a subset of prompts, gathering telemetry data in real-time. Second, HELIOS uses the early exit data from these evaluations to greedily load the selected model only up to a limited number of layers. This approach yields memory savings which enables us to process more requests at the same time, thereby improving throughput. Third, HELIOS monitors and periodically reassesses the performance of the candidate LLMs and if needed, switches to another model that can service incoming queries more efficiently (such as using fewer layers without lowering accuracy). Our evaluations show that HELIOS achieves 1.48$\times$ throughput, 1.10$\times$ energy-efficiency, 1.39$\times$ lower response time, and 3.7$\times$ improvements in inference batch sizes compared to the baseline, when optimizing for the respective service level objectives.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Equivalence of Variants of Shadowing of Free Semigroup Actions
Authors:
Pramod Kumar Das,
Priyabrata Bag
Abstract:
We prove that for finitely generated free semigroup actions the average shadowing property, the weak asymptotic average shadowing property, the mean ergodic shadowing property, the almost asymptotic average shadowing property, the asymptotic average shadowing property and the $M_α$-shadowing property for every $α\in (0,1)$, are equivalent. This gives an affirmative answer to an open question asked…
▽ More
We prove that for finitely generated free semigroup actions the average shadowing property, the weak asymptotic average shadowing property, the mean ergodic shadowing property, the almost asymptotic average shadowing property, the asymptotic average shadowing property and the $M_α$-shadowing property for every $α\in (0,1)$, are equivalent. This gives an affirmative answer to an open question asked in Question 10.3 [M. Kulczycki, D. Kwietniak, P. Oprocha, On almost specification and average shadowing properties, Fundamenta Mathematicae, 224 (2014)].
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Context Switching for Secure Multi-programming of Near-Term Quantum Computers
Authors:
Avinash Kumar,
Meng Wang,
Chenxu Liu,
Ang Li,
Prashant J. Nair,
Poulami Das
Abstract:
Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program…
▽ More
Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program outputs in 40% of cases on commercial systems.
We identify that ZKTAs succeed because the attacker's program consistently runs with the same victim program in a fixed context. To mitigate this, we propose QONTEXTS: a context-switching technique that defends against ZKTAs by running programs across multiple contexts, each handling only a subset of trials. QONTEXTS uses multi-programming with frequent context switching while identifying a unique set of programs for each context. This helps limit only a fraction of execution to ZKTAs. We enhance QONTEXTS with attack detection capabilities that compare the distributions from different contexts against each other to identify noisy contexts executed with ZKTAs. Our evaluations on real IBMQ systems show that QONTEXTS increases program resilience by three orders of magnitude and fidelity by 1.33$\times$ on average. Moreover, QONTEXTS improves throughput by 2$\times$, advancing security in multi-programmed environments.
△ Less
Submitted 17 April, 2025; v1 submitted 9 April, 2025;
originally announced April 2025.
-
Two-Axis planar Hall magnetic field sensors with sub nanoTesla resolution
Authors:
Proloy Taran Das,
Hariharan Nhalil,
Vladislav Mor,
Moty Schultz,
Nir Hasidim,
Asaf Grosz,
Lior Klein
Abstract:
Planar Hall effect (PHE) magnetic sensors are attractive for various applications where the field resolution is required in the range of sub-nano Tesla or in Pico Tesla. Here we present a detailed noise study of the PHE sensors consisting of two or three intersecting ellipses. It can be used to measure two axes of the magnetic field in the sensor plane in particular along the two perpendicular eas…
▽ More
Planar Hall effect (PHE) magnetic sensors are attractive for various applications where the field resolution is required in the range of sub-nano Tesla or in Pico Tesla. Here we present a detailed noise study of the PHE sensors consisting of two or three intersecting ellipses. It can be used to measure two axes of the magnetic field in the sensor plane in particular along the two perpendicular easy axes in the overlapping region for two intersecting ellipses and three easy axes at an angle of 60 degrees for three crossing ellipses. Thus, for each remanent magnetic state in the overlap area, the sensor can measure the vector component of the magnetic field perpendicular to the direction of the remanent magnetization. The two field components are measured with a field resolution less than 200 pT/sqrt(Hz) at 10 Hz and 350 pT/sqrt(Hz) at 1 Hz in the same region, while maintaining a similar size and noise level of a single-axis sensor. Furthermore, we discuss here the possible route for future improvement of the field resolution
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural Networks
Authors:
Huzaifa Arif,
Keerthiram Murugesan,
Payel Das,
Alex Gittens,
Pin-Yu Chen
Abstract:
This paper explores inference-time data leakage risks of deep neural networks (NNs), where a curious and honest model service provider is interested in retrieving users' private data inputs solely based on the model inference results. Particularly, we revisit residual NNs due to their popularity in computer vision and our hypothesis that residual blocks are a primary cause of data leakage owing to…
▽ More
This paper explores inference-time data leakage risks of deep neural networks (NNs), where a curious and honest model service provider is interested in retrieving users' private data inputs solely based on the model inference results. Particularly, we revisit residual NNs due to their popularity in computer vision and our hypothesis that residual blocks are a primary cause of data leakage owing to the use of skip connections. By formulating inference-time data leakage as a constrained optimization problem, we propose a novel backward feature inversion method, \textbf{PEEL}, which can effectively recover block-wise input features from the intermediate output of residual NNs. The surprising results in high-quality input data recovery can be explained by the intuition that the output from these residual blocks can be considered as a noisy version of the input and thus the output retains sufficient information for input recovery. We demonstrate the effectiveness of our layer-by-layer feature inversion method on facial image datasets and pre-trained classifiers. Our results show that PEEL outperforms the state-of-the-art recovery methods by an order of magnitude when evaluated by mean squared error (MSE). The code is available at \href{https://github.com/Huzaifa-Arif/PEEL}{https://github.com/Huzaifa-Arif/PEEL}
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Exchange-Biased multi-ring Planar Hall Magnetoresistive Sensors with nT resolution in Non-Shielded Environments
Authors:
Jan Schmidtpeter,
Proloy Taran Das,
Yevhen Zabila,
Conrad Schubert,
Thomas Gundrum,
Thomas Wondrak,
Denys Makarov
Abstract:
Planar Hall magnetoresistive sensors (PHMR) are promising candidates for various magnetic sensing applications due to their high sensitivity, low power consumption, and compatibility with integrated circuit technology. However, their performance is often limited by inherent noise sources, impacting their resolution and overall sensitivity. Here the effect of three bilayer structures NiFe(10 nm)/Ir…
▽ More
Planar Hall magnetoresistive sensors (PHMR) are promising candidates for various magnetic sensing applications due to their high sensitivity, low power consumption, and compatibility with integrated circuit technology. However, their performance is often limited by inherent noise sources, impacting their resolution and overall sensitivity. Here the effect of three bilayer structures NiFe(10 nm)/IrMn(10 nm), NiFe(30 nm)/IrMn(10 nm), and NiFe(30 nm)/IrMn(20 nm) on noise levels is investigated at low-frequency (DC - 25 Hz). This study includes a detailed investigation on the optimization process and noise characteristics of multiring PHMR sensors, focusing on identifying and quantifying the dominant noise sources. The experimental measurements are complemented by a theoretical analysis of noise sources including thermal noise, 1/f noise, intermixing and environmental noise. The best magnetic resolution is observed for the NiFe(30 nm)/IrMn(10 nm) structure, which achieves a detectivity below 1.5 nT/sqrt(Hz) at 10 Hz in a non-shielded environment at room temperature. In addition, a substantial improvement in sensitivity is observed by annealing the sensors at 250 deg C for 1 hour. The findings of this study contribute to a deeper understanding of noise behavior in PHMR sensors, paving the way for developing strategies to improve their performance for demanding sensing applications at low frequencies.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Studying stellar populations in Omega Centauri with phylogenetics
Authors:
P. Jofré,
C. Aguilera-Gómez,
P. Villarreal,
F. A. Cubillos,
P. Das,
X. Hua,
R. Yates,
P. Silva,
S. Vitali,
T. Peña,
T. Signor,
K. Walsen,
P. Tissera,
A. Rojas-Arriagada,
E. Johnston,
G. Gilmore,
R. Foley
Abstract:
The nature and formation history of our Galaxy's largest and most enigmatic stellar cluster, known as Omega Centauri (ocen) remains debated. Here, we offer a novel approach to disentangling the complex stellar populations within ocen based on phylogenetics methodologies from evolutionary biology. These include the Gaussian Mixture Model and Neighbor-Joining clustering algorithms applied to a set o…
▽ More
The nature and formation history of our Galaxy's largest and most enigmatic stellar cluster, known as Omega Centauri (ocen) remains debated. Here, we offer a novel approach to disentangling the complex stellar populations within ocen based on phylogenetics methodologies from evolutionary biology. These include the Gaussian Mixture Model and Neighbor-Joining clustering algorithms applied to a set of chemical abundances of ocen stellar members. Instead of using the classical approach in astronomy of grouping them into separate populations, we focused on how the stars are related to each other. In this way, we could identify stars that likely formed in globular clusters versus those originating from prolonged in-situ star formation and how these stars interconnect. Our analysis supports the hypothesis that ocen might be a nuclear star cluster of a galaxy accreted by the Milky Way with a mass of about 10^9M_sun. Furthermore, we revealed the existence of a previously unidentified in-situ stellar population with a distinct chemical pattern unlike any known population found in the Milky Way to date. Our analysis of ocen is an example of the success of cross-disciplinary research and shows the vast potential of applying evolutionary biology tools to astronomical datasets, opening new avenues for understanding the chemical evolution of complex stellar systems.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models
Authors:
Pin-Yu Chen,
Han Shen,
Payel Das,
Tianyi Chen
Abstract:
Fine-tuning Large Language Models (LLMs) on some task-specific datasets has been a primary use of LLMs. However, it has been empirically observed that this approach to enhancing capability inevitably compromises safety, a phenomenon also known as the safety-capability trade-off in LLM fine-tuning. This paper presents a theoretical framework for understanding the interplay between safety and capabi…
▽ More
Fine-tuning Large Language Models (LLMs) on some task-specific datasets has been a primary use of LLMs. However, it has been empirically observed that this approach to enhancing capability inevitably compromises safety, a phenomenon also known as the safety-capability trade-off in LLM fine-tuning. This paper presents a theoretical framework for understanding the interplay between safety and capability in two primary safety-aware LLM fine-tuning strategies, providing new insights into the effects of data similarity, context overlap, and alignment loss landscape. Our theoretical results characterize the fundamental limits of the safety-capability trade-off in LLM fine-tuning, which are also validated by numerical experiments.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Evaluating Negative Sampling Approaches for Neural Topic Models
Authors:
Suman Adhya,
Avishek Lahiri,
Debarshi Kumar Sanyal,
Partha Pratim Das
Abstract:
Negative sampling has emerged as an effective technique that enables deep learning models to learn better representations by introducing the paradigm of learn-to-compare. The goal of this approach is to add robustness to deep learning models to learn better representation by comparing the positive samples against the negative ones. Despite its numerous demonstrations in various areas of computer v…
▽ More
Negative sampling has emerged as an effective technique that enables deep learning models to learn better representations by introducing the paradigm of learn-to-compare. The goal of this approach is to add robustness to deep learning models to learn better representation by comparing the positive samples against the negative ones. Despite its numerous demonstrations in various areas of computer vision and natural language processing, a comprehensive study of the effect of negative sampling in an unsupervised domain like topic modeling has not been well explored. In this paper, we present a comprehensive analysis of the impact of different negative sampling strategies on neural topic models. We compare the performance of several popular neural topic models by incorporating a negative sampling technique in the decoder of variational autoencoder-based neural topic models. Experiments on four publicly available datasets demonstrate that integrating negative sampling into topic models results in significant enhancements across multiple aspects, including improved topic coherence, richer topic diversity, and more accurate document classification. Manual evaluations also indicate that the inclusion of negative sampling into neural topic models enhances the quality of the generated topics. These findings highlight the potential of negative sampling as a valuable tool for advancing the effectiveness of neural topic models.
△ Less
Submitted 25 March, 2025; v1 submitted 23 March, 2025;
originally announced March 2025.
-
Developing cholera outbreak forecasting through qualitative dynamics: Insights into Malawi case study
Authors:
Adrita Ghosh,
Parthasakha Das,
Tanujit Chakraborty,
Pritha Das,
Dibakar Ghosh
Abstract:
Cholera, an acute diarrheal disease, is a serious concern in developing and underdeveloped areas. A qualitative understanding of cholera epidemics aims to foresee transmission patterns based on reported data and mechanistic models. The mechanistic model is a crucial tool for capturing the dynamics of disease transmission and population spread. However, using real-time cholera cases is essential fo…
▽ More
Cholera, an acute diarrheal disease, is a serious concern in developing and underdeveloped areas. A qualitative understanding of cholera epidemics aims to foresee transmission patterns based on reported data and mechanistic models. The mechanistic model is a crucial tool for capturing the dynamics of disease transmission and population spread. However, using real-time cholera cases is essential for forecasting the transmission trend. This prospective study seeks to furnish insights into transmission trends through qualitative dynamics followed by machine learning-based forecasting. The Monte Carlo Markov Chain approach is employed to calibrate the proposed mechanistic model. We identify critical parameters that illustrate the disease's dynamics using partial rank correlation coefficient-based sensitivity analysis. The basic reproduction number as a crucial threshold measures asymptotic dynamics. Furthermore, forward bifurcation directs the stability of the infection state, and Hopf bifurcation suggests that trends in transmission may become unpredictable as societal disinfection rates rise. Further, we develop epidemic-informed machine learning models by incorporating mechanistic cholera dynamics into autoregressive integrated moving averages and autoregressive neural networks. We forecast short-term future cholera cases in Malawi by implementing the proposed epidemic-informed machine learning models to support this. We assert that integrating temporal dynamics into the machine learning models can enhance the capabilities of cholera forecasting models. The execution of this mechanism can significantly influence future trends in cholera transmission. This evolving approach can also be beneficial for policymakers to interpret and respond to potential disease systems. Moreover, our methodology is replicable and adaptable, encouraging future research on disease dynamics.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Deep Neural Network-Based Voltage Prediction for Alkali-Metal-Ion Battery Materials
Authors:
Sk Mujaffar Hossain,
Namitha Anna Koshi,
Seung-Cheol Lee,
G. P Das,
Satadeep Bhattacharjee
Abstract:
Accurate voltage prediction of battery materials plays a pivotal role in advancing energy storage technologies and in the rational design of high-performance cathode materials. In this work, we present a deep neural network (DNN) model, built using PyTorch, to estimate the average voltage of cathode materials across Li-ion, Na-ion, and other alkali-metal-ion batteries. The model is trained on an e…
▽ More
Accurate voltage prediction of battery materials plays a pivotal role in advancing energy storage technologies and in the rational design of high-performance cathode materials. In this work, we present a deep neural network (DNN) model, built using PyTorch, to estimate the average voltage of cathode materials across Li-ion, Na-ion, and other alkali-metal-ion batteries. The model is trained on an extensive dataset from the Materials Project, incorporating a wide range of descriptors-structural, physical, chemical, electronic, thermodynamic, and battery-specific-ensuring a comprehensive representation of material properties. Our model exhibits strong predictive performance, as corroborated by first-principles density functional theory (DFT) calculations. The close alignment between the DNN predictions and DFT outcomes highlights the robustness and accuracy of our machine learning framework in effectively screening and identifying viable battery materials. Utilizing this validated model, we successfully propose novel Na-ion battery compositions, with their predicted behavior confirmed through rigorous computational assessment. By seamlessly integrating data-driven prediction with first-principles validation, this study presents an effective framework that significantly accelerates the discovery and optimization of advanced battery materials, contributing to the development of more reliable and efficient energy storage technologies.
△ Less
Submitted 3 April, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
Can Memory-Augmented Language Models Generalize on Reasoning-in-a-Haystack Tasks?
Authors:
Payel Das,
Ching-Yun Ko,
Sihui Dai,
Georgios Kollias,
Subhajit Chaudhury,
Aurelie Lozano
Abstract:
Large language models often expose their brittleness in reasoning tasks, especially while executing long chains of reasoning over context. We propose MemReasoner, a new and simple memory-augmented LLM architecture, in which the memory learns the relative order of facts in context, and enables hopping over them, while the decoder selectively attends to the memory. MemReasoner is trained end-to-end,…
▽ More
Large language models often expose their brittleness in reasoning tasks, especially while executing long chains of reasoning over context. We propose MemReasoner, a new and simple memory-augmented LLM architecture, in which the memory learns the relative order of facts in context, and enables hopping over them, while the decoder selectively attends to the memory. MemReasoner is trained end-to-end, with optional supporting fact supervision of varying degrees. We train MemReasoner, along with existing memory-augmented transformer models and a state-space model, on two distinct synthetic multi-hop reasoning tasks. Experiments performed under a variety of challenging scenarios, including the presence of long distractor text or target answer changes in test set, show strong generalization of MemReasoner on both single- and two-hop tasks. This generalization of MemReasoner is achieved using none-to-weak supporting fact supervision (using none and 1\% of supporting facts for one- and two-hop tasks, respectively). In contrast, baseline models overall struggle to generalize and benefit far less from using full supporting fact supervision. The results highlight the importance of explicit memory mechanisms, combined with additional weak supervision, for improving large language model's context processing ability toward reasoning tasks.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Unsupervised Multi-Clustering and Decision-Making Strategies for 4D-STEM Orientation Mapping
Authors:
Junhao Cao,
Nicolas Folastre,
Gozde Oney,
Edgar Rauch,
Stavros Nicolopoulos,
Partha Pratim Das,
Arnaud Demortière
Abstract:
This study presents a novel integration of unsupervised learning and decision-making strategies for the advanced analysis of 4D-STEM datasets, with a focus on non-negative matrix factorization (NMF) as the primary clustering method. Our approach introduces a systematic framework to determine the optimal number of components (k) required for robust and interpretable orientation mapping. By leveragi…
▽ More
This study presents a novel integration of unsupervised learning and decision-making strategies for the advanced analysis of 4D-STEM datasets, with a focus on non-negative matrix factorization (NMF) as the primary clustering method. Our approach introduces a systematic framework to determine the optimal number of components (k) required for robust and interpretable orientation mapping. By leveraging the K-Component Loss method and Image Quality Assessment (IQA) metrics, we effectively balance reconstruction fidelity and model complexity. Additionally, we highlight the critical role of dataset preprocessing in improving clustering stability and accuracy. Furthermore, our spatial weight matrix analysis provides insights into overlapping regions within the dataset by employing threshold-based visualization, facilitating a detailed understanding of cluster interactions. The results demonstrate the potential of combining NMF with advanced IQA metrics and preprocessing techniques for reliable orientation mapping and structural analysis in 4D-STEM datasets, paving the way for future applications in multi-dimensional material characterization.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
EDGE: The emergence of dwarf galaxy scaling relations from cosmological radiation-hydrodynamics simulations
Authors:
Martin P. Rey,
Ethan Taylor,
Emily I. Gray,
Stacy Y. Kim,
Eric P. Andersson,
Andrew Pontzen,
Oscar Agertz,
Justin I. Read,
Corentin Cadiou,
Robert M. Yates,
Matthew D. A. Orkney,
Dirk Scholte,
Amélie Saintonge,
Joseph Breneman,
Kristen B. W. McQuinn,
Claudia Muni,
Payel Das
Abstract:
We present a new suite of EDGE (`Engineering Dwarfs at Galaxy formation's Edge') cosmological zoom simulations. The suite includes 15 radiation-hydrodynamical dwarf galaxies covering the ultra-faint to the dwarf irregular regime ($10^4 \leq M_{\star}(z=0) \leq 10^8 \, M_{\odot}$) to enable comparisons with observed scaling relations. Each object in the suite is evolved at high resolution (…
▽ More
We present a new suite of EDGE (`Engineering Dwarfs at Galaxy formation's Edge') cosmological zoom simulations. The suite includes 15 radiation-hydrodynamical dwarf galaxies covering the ultra-faint to the dwarf irregular regime ($10^4 \leq M_{\star}(z=0) \leq 10^8 \, M_{\odot}$) to enable comparisons with observed scaling relations. Each object in the suite is evolved at high resolution ($\approx 3 \, \text{pc}$) and includes stellar radiation, winds and supernova feedback channels. We compare with previous EDGE simulations without radiation, finding that radiative feedback results in significantly weaker galactic outflows. This generalises our previous findings to a wide mass range, and reveals that the effect is most significant at low $M_{\star}$. Despite this difference, stellar masses stay within a factor of two of each other, and key scaling relations of dwarf galaxies (size-mass, neutral gas-stellar mass, gas-phase mass-metallicity) emerge correctly in both simulation suites. Only the stellar mass -- stellar metallicity relation is strongly sensitive to the change in feedback. This highlights how obtaining statistical samples of dwarf galaxy stellar abundances with next-generation spectrographs will be key to probing and constraining the baryon cycle of dwarf galaxies.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
Authors:
Ravi Ghadia,
Avinash Kumar,
Gaurav Jain,
Prashant Nair,
Poulami Das
Abstract:
Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications -- such as chatbots and interactive assistants -- where low latency and high memory efficiency are critical. Existing…
▽ More
Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications -- such as chatbots and interactive assistants -- where low latency and high memory efficiency are critical. Existing methods drop distant tokens or compress states in a lossy manner, sacrificing accuracy by discarding vital context or introducing bias.
We propose MorphKV, an inference-time technique that maintains a constant-sized KV cache while preserving accuracy. MorphKV balances long-range dependencies and local coherence during text generation. It eliminates early-token bias while retaining high-fidelity context by adaptively ranking tokens through correlation-aware selection. Unlike heuristic retention or lossy compression, MorphKV iteratively refines the KV cache via lightweight updates guided by attention patterns of recent tokens. This approach captures inter-token correlation with greater accuracy, crucial for tasks like content creation and code generation. Our studies on long-response tasks show 52.9$\%$ memory savings and 18.2$\%$ higher accuracy on average compared to state-of-the-art prior works, enabling efficient real-world deployment.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Applications of the Quantum Phase Difference Estimation Algorithm to the Excitation Energies in Spin Systems on a NISQ Device
Authors:
Boni Paul,
Sudhindu Bikash Mandal,
Kenji Sugisaki,
B. P. Das
Abstract:
The Quantum Phase Difference Estimation (QPDE) algorithm, as an extension of the Quantum Phase Estimation (QPE), is a quantum algorithm designed to compute the differences of two eigenvalues of a unitary operator by exploiting the quantum superposition of two eigenstates. Unlike QPE, QPDE is free of controlled-unitary operations, and is suitable for calculations on noisy intermediate-scale quantum…
▽ More
The Quantum Phase Difference Estimation (QPDE) algorithm, as an extension of the Quantum Phase Estimation (QPE), is a quantum algorithm designed to compute the differences of two eigenvalues of a unitary operator by exploiting the quantum superposition of two eigenstates. Unlike QPE, QPDE is free of controlled-unitary operations, and is suitable for calculations on noisy intermediate-scale quantum (NISQ) devices. We present the implementation and verification of a novel early fault-tolerant QPDE algorithm for determining energy gaps across diverse spin system configurations using NISQ devices. The algorithm is applied to the systems described by two and three-spin Heisenberg Hamiltonians with different geometric arrangements and coupling strengths, including symmetric, asymmetric, spin-frustrated, and non-frustrated configurations. By leveraging the match gate-like structure of the time evolution operator of Heisenberg Hamiltonian, we achieve constant-depth quantum circuits suitable for NISQ hardware implementation. Our results on IBM quantum processors show remarkable accuracy ranging from 85\% to 93\%, demonstrating excellent agreement with classical calculations even in the presence of hardware noise. The methodology incorporates sophisticated quantum noise suppression techniques, including Pauli Twirling and Dynamical Decoupling, and employs an adaptive framework. Our findings demonstrate the practical viability of the QPDE algorithm for quantum many-body simulations on current NISQ hardware, establishing a robust framework for future applications.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Quantum Imaging of Photonic Spin Texture in an OAM Beam with NV Centers in Diamond
Authors:
Shoaib Mahmud,
Wei Zhang,
Farid Kalhor,
Pronoy Das,
Zubin Jacob
Abstract:
Photonic spin texture (PST), the spatial distribution of the spin angular momentum (SAM) of light, is connected to unique properties of light, such as optical skyrmions and topological optical N-invariants. There has been recent progress on the generation and manipulation of PST using various methodologies. However, a challenge remains for the sub-wavelength characterization of PST. Here, we demon…
▽ More
Photonic spin texture (PST), the spatial distribution of the spin angular momentum (SAM) of light, is connected to unique properties of light, such as optical skyrmions and topological optical N-invariants. There has been recent progress on the generation and manipulation of PST using various methodologies. However, a challenge remains for the sub-wavelength characterization of PST. Here, we demonstrate nitrogen-vacancy (NV) centers in diamond as nanoscale quantum sensors for imaging the PST of a beam with orbital angular momentum (OAM). Leveraging the coherent interaction between photon spin and NV center electron spin at cryogenic temperature (77 K), and using the Hahn-Echo magnetometry technique, we experimentally demonstrate the imprinting of the PST on the quantum phase of NV centers. Our work can lead to the development of a quantum imaging platform capable of characterization of the spin texture of light at sub-wavelength scales.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts
Authors:
Subhajit Chaudhury,
Payel Das,
Sarathkrishna Swaminathan,
Georgios Kollias,
Elliot Nelson,
Khushbu Pahwa,
Tejaswini Pedapati,
Igor Melnyk,
Matthew Riemer
Abstract:
Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbf{EpMAN} -- a method for processing long contexts in an \textit{episodic memory} module while \textit{holistically attending to} semantically relevant context chunks. The output of \te…
▽ More
Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbf{EpMAN} -- a method for processing long contexts in an \textit{episodic memory} module while \textit{holistically attending to} semantically relevant context chunks. The output of \textit{episodic attention} is then used to reweigh the decoder's self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using \textbf{EpMAN}, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
REVERSUM: A Multi-staged Retrieval-Augmented Generation Method to Enhance Wikipedia Tail Biographies through Personal Narratives
Authors:
Sayantan Adak,
Pauras Mangesh Meher,
Paramita Das,
Animesh Mukherjee
Abstract:
Wikipedia is an invaluable resource for factual information about a wide range of entities. However, the quality of articles on less-known entities often lags behind that of the well-known ones. This study proposes a novel approach to enhancing Wikipedia's B and C category biography articles by leveraging personal narratives such as autobiographies and biographies. By utilizing a multi-staged retr…
▽ More
Wikipedia is an invaluable resource for factual information about a wide range of entities. However, the quality of articles on less-known entities often lags behind that of the well-known ones. This study proposes a novel approach to enhancing Wikipedia's B and C category biography articles by leveraging personal narratives such as autobiographies and biographies. By utilizing a multi-staged retrieval-augmented generation technique -- REVerSum -- we aim to enrich the informational content of these lesser-known articles. Our study reveals that personal narratives can significantly improve the quality of Wikipedia articles, providing a rich source of reliable information that has been underutilized in previous studies. Based on crowd-based evaluation, REVerSum generated content outperforms the best performing baseline by 17% in terms of integrability to the original Wikipedia article and 28.5\% in terms of informativeness. Code and Data are available at: https://github.com/sayantan11995/wikipedia_enrichment
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Agency in Artificial Intelligence Systems
Authors:
Parashar Das
Abstract:
There is a general concern that present developments in artificial intelligence (AI) research will lead to sentient AI systems, and these may pose an existential threat to humanity. But why cannot sentient AI systems benefit humanity instead? This paper endeavours to put this question in a tractable manner. I ask whether a putative AI system will develop an altruistic or a malicious disposition to…
▽ More
There is a general concern that present developments in artificial intelligence (AI) research will lead to sentient AI systems, and these may pose an existential threat to humanity. But why cannot sentient AI systems benefit humanity instead? This paper endeavours to put this question in a tractable manner. I ask whether a putative AI system will develop an altruistic or a malicious disposition towards our society, or what would be the nature of its agency? Given that AI systems are being developed into formidable problem solvers, we can reasonably expect these systems to preferentially take on conscious aspects of human problem solving. I identify the relevant phenomenal aspects of agency in human problem solving. The functional aspects of conscious agency can be monitored using tools provided by functionalist theories of consciousness. A recent expert report (Butlin et al. 2023) has identified functionalist indicators of agency based on these theories. I show how to use the Integrated Information Theory (IIT) of consciousness, to monitor the phenomenal nature of this agency. If we are able to monitor the agency of AI systems as they develop, then we can dissuade them from becoming a menace to society while encouraging them to be an aid.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond
Authors:
Kehan Guo,
Yili Shen,
Gisela Abigail Gonzalez-Montiel,
Yue Huang,
Yujun Zhou,
Mihir Surve,
Zhichun Guo,
Prayel Das,
Nitesh V Chawla,
Olaf Wiest,
Xiangliang Zhang
Abstract:
The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry, yet the application of these methods to spectroscopic and spectrometric data, referred to as Spectroscopy Machine Learning (SpectraML), remains relatively underexplored. Modern spectroscopic techniques (MS, NMR, IR, Raman, UV-Vis) generate an ever-growing volume of high-dime…
▽ More
The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry, yet the application of these methods to spectroscopic and spectrometric data, referred to as Spectroscopy Machine Learning (SpectraML), remains relatively underexplored. Modern spectroscopic techniques (MS, NMR, IR, Raman, UV-Vis) generate an ever-growing volume of high-dimensional data, creating a pressing need for automated and intelligent analysis beyond traditional expert-based workflows. In this survey, we provide a unified review of SpectraML, systematically examining state-of-the-art approaches for both forward tasks (molecule-to-spectrum prediction) and inverse tasks (spectrum-to-molecule inference). We trace the historical evolution of ML in spectroscopy, from early pattern recognition to the latest foundation models capable of advanced reasoning, and offer a taxonomy of representative neural architectures, including graph-based and transformer-based methods. Addressing key challenges such as data quality, multimodal integration, and computational scalability, we highlight emerging directions such as synthetic data generation, large-scale pretraining, and few- or zero-shot learning. To foster reproducible research, we also release an open-source repository containing recent papers and their corresponding curated datasets (https://github.com/MINE-Lab-ND/SpectrumML_Survey_Papers). Our survey serves as a roadmap for researchers, guiding progress at the intersection of spectroscopy and AI.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Likelihood-Free Estimation for Spatiotemporal Hawkes processes with missing data and application to predictive policing
Authors:
Pramit Das,
Moulinath Banerjee,
Yuekai Sun
Abstract:
With the growing use of AI technology, many police departments use forecasting software to predict probable crime hotspots and allocate patrolling resources effectively for crime prevention. The clustered nature of crime data makes self-exciting Hawkes processes a popular modeling choice. However, one significant challenge in fitting such models is the inherent missingness in crime data due to non…
▽ More
With the growing use of AI technology, many police departments use forecasting software to predict probable crime hotspots and allocate patrolling resources effectively for crime prevention. The clustered nature of crime data makes self-exciting Hawkes processes a popular modeling choice. However, one significant challenge in fitting such models is the inherent missingness in crime data due to non-reporting, which can bias the estimated parameters of the predictive model, leading to inaccurate downstream hotspot forecasts, often resulting in over or under-policing in various communities, especially the vulnerable ones. Our work introduces a Wasserstein Generative Adversarial Networks (WGAN) driven likelihood-free approach to account for unreported crimes in Spatiotemporal Hawkes models. We demonstrate through empirical analysis how this methodology improves the accuracy of parametric estimation in the presence of data missingness, leading to more reliable and efficient policing strategies.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification
Authors:
Vanshali Sharma,
Debesh Jha,
M. K. Bhuyan,
Pradip K. Das,
Ulas Bagci
Abstract:
Pathologic diagnosis is a critical phase in deciding the optimal treatment procedure for dealing with colorectal cancer (CRC). Colonic polyps, precursors to CRC, can pathologically be classified into two major types: adenomatous and hyperplastic. For precise classification and early diagnosis of such polyps, the medical procedure of colonoscopy has been widely adopted paired with various imaging t…
▽ More
Pathologic diagnosis is a critical phase in deciding the optimal treatment procedure for dealing with colorectal cancer (CRC). Colonic polyps, precursors to CRC, can pathologically be classified into two major types: adenomatous and hyperplastic. For precise classification and early diagnosis of such polyps, the medical procedure of colonoscopy has been widely adopted paired with various imaging techniques, including narrow band imaging and white light imaging. However, the existing classification techniques mainly rely on a single imaging modality and show limited performance due to data scarcity. Recently, generative artificial intelligence has been gaining prominence in overcoming such issues. Additionally, various generation-controlling mechanisms using text prompts and images have been introduced to obtain visually appealing and desired outcomes. However, such mechanisms require class labels to make the model respond efficiently to the provided control input. In the colonoscopy domain, such controlling mechanisms are rarely explored; specifically, the text prompt is a completely uninvestigated area. Moreover, the unavailability of expensive class-wise labels for diverse sets of images limits such explorations. Therefore, we develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics in terms of pathology, imaging modalities, and quality. We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation. The experimental results report an improvement of up to 7.91% in balanced accuracy using a publicly available dataset. Moreover, cross-class label learning achieves a statistically significant improvement of up to 18.33% in balanced accuracy during video-level analysis. The code is available at https://github.com/Vanshali/PathoPolyp-Diff.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
A Note On Rainbow 4-Term Arithmetic Progression
Authors:
Subhajit Jana,
Pratulananda Das
Abstract:
Let [n]=\{1,\,2,...,\,n\} be colored in k colors. A rainbow AP(k) in [n] is a k term arithmetic progression whose elements have diferent colors. Conlon, Jungic and Radoicic [10] had shown that there exists an equinumerous 4-coloring of [4n] which happens to be rainbow AP(4) free, when n is even and subsequently Haghighi and Nowbandegani [7] shown that such a coloring of [4n] also exists when n>1 i…
▽ More
Let [n]=\{1,\,2,...,\,n\} be colored in k colors. A rainbow AP(k) in [n] is a k term arithmetic progression whose elements have diferent colors. Conlon, Jungic and Radoicic [10] had shown that there exists an equinumerous 4-coloring of [4n] which happens to be rainbow AP(4) free, when n is even and subsequently Haghighi and Nowbandegani [7] shown that such a coloring of [4n] also exists when n>1 is odd. Based on their construction, we shown that a balanced 4-coloring of [n] ( i.e. size of each color class is at least \left\lfloor n/4\right\rfloor ) actually exists for all natural number n. Further we established that for nonnegative integers k\geq3 and n>1, every balanced k-coloring of [kn+r] with 0\leq r<k-1, contains a rainbow AP(k) if and only if k=3. In this paper we also have discussed about rainbow free equinumerous 4-coloring of \mathbb{Z}_{n}.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
New Methods for Critical Analysis: Revealing the Simultaneous Existence of Universality Classes in Nontrivial Magnetic Systems
Authors:
Harish Chandr Chauhan,
Umesh C. Roy,
Shovan Dan,
A. Thamizhavel,
Pintu Das
Abstract:
In magnetic systems, the microscopic constituents exhibit power law behavior near the paramagnetic transition temperature, $T_C$. The critical exponents (CEs) associated with the physical quantities that demonstrate singular behavior at $T_C$ illustrate the critical behavior, specifically the range and type of exchange interactions emerging in magnetic systems. However, it is realized that the dev…
▽ More
In magnetic systems, the microscopic constituents exhibit power law behavior near the paramagnetic transition temperature, $T_C$. The critical exponents (CEs) associated with the physical quantities that demonstrate singular behavior at $T_C$ illustrate the critical behavior, specifically the range and type of exchange interactions emerging in magnetic systems. However, it is realized that the developed methodologies may not yield accurate values of CEs, especially for magnetic systems with competing interactions, referred to as nontrivial magnetic systems. Currently, no comprehensive method effectively addresses the competing effects of the range of magnetic interactions among the constituent entities emerging in such systems. Additionally, there is no definitive explanation for CE values that do not belong to any single universality class. Here, we present new methodologies for critical analysis aimed at determining both the range of exchange interaction(s) and appropriate values of CEs. Using computational and experimental investigations, we analyze the magnetic behavior of trivial Ni and nontrivial Gd. Our findings demonstrate that (i) the critical behavior remains the same on either side of $T_C$, (ii) the critical behavior associated with local electron moments remains unaffected by the magnetic field, and (iii) in Gd, the critical role of competing interactions becomes evident: local electron moments follow a three-dimensional Ising-type short-range interaction, while itinerant electron moments exhibit a mean-field-type long-range Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction, which weakens under an external magnetic field due to the localization effect on itinerant electrons.
△ Less
Submitted 11 May, 2025; v1 submitted 28 January, 2025;
originally announced January 2025.
-
Boli: A dataset for understanding stuttering experience and analyzing stuttered speech
Authors:
Ashita Batra,
Mannas Narang,
Neeraj Kumar Sharma,
Pradip K Das
Abstract:
There is a growing need for diverse, high-quality stuttered speech data, particularly in the context of Indian languages. This paper introduces Project Boli, a multi-lingual stuttered speech dataset designed to advance scientific understanding and technology development for individuals who stutter, particularly in India. The dataset constitutes (a) anonymized metadata (gender, age, country, mother…
▽ More
There is a growing need for diverse, high-quality stuttered speech data, particularly in the context of Indian languages. This paper introduces Project Boli, a multi-lingual stuttered speech dataset designed to advance scientific understanding and technology development for individuals who stutter, particularly in India. The dataset constitutes (a) anonymized metadata (gender, age, country, mother tongue) and responses to a questionnaire about how stuttering affects their daily lives, (b) captures both read speech (using the Rainbow Passage) and spontaneous speech (through image description tasks) for each participant and (c) includes detailed annotations of five stutter types: blocks, prolongations, interjections, sound repetitions and word repetitions. We present a comprehensive analysis of the dataset, including the data collection procedure, experience summarization of people who stutter, severity assessment of stuttering events and technical validation of the collected data. The dataset is released as an open access to further speech technology development.
△ Less
Submitted 1 May, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
Efficient Self-Supervised Grading of Prostate Cancer Pathology
Authors:
Riddhasree Bhattacharyya,
Surochita Pal Das,
Sushmita Mitra
Abstract:
Prostate cancer grading using the ISUP system (International Society of Urological Pathology) for treatment decisions is highly subjective and requires considerable expertise. Despite advances in computer-aided diagnosis systems, few have handled efficient ISUP grading on Whole Slide Images (WSIs) of prostate biopsies based only on slide-level labels. Some of the general challenges include managin…
▽ More
Prostate cancer grading using the ISUP system (International Society of Urological Pathology) for treatment decisions is highly subjective and requires considerable expertise. Despite advances in computer-aided diagnosis systems, few have handled efficient ISUP grading on Whole Slide Images (WSIs) of prostate biopsies based only on slide-level labels. Some of the general challenges include managing gigapixel WSIs, obtaining patch-level annotations, and dealing with stain variability across centers. One of the main task-specific challenges faced by deep learning in ISUP grading, is the learning of patch-level features of Gleason patterns (GPs) based only on their slide labels. In this scenario, an efficient framework for ISUP grading is developed.
The proposed TSOR is based on a novel Task-specific Self-supervised learning (SSL) model, which is fine-tuned using Ordinal Regression. Since the diversity of training samples plays a crucial role in SSL, a patch-level dataset is created to be relatively balanced w.r.t. the Gleason grades (GGs). This balanced dataset is used for pre-training, so that the model can effectively learn stain-agnostic features of the GP for better generalization. In medical image grading, it is desirable that misclassifications be as close as possible to the actual grade. From this perspective, the model is then fine-tuned for the task of ISUP grading using an ordinal regression-based approach. Experimental results on the most extensive multicenter prostate biopsies dataset (PANDA challenge), as well as the SICAP dataset, demonstrate the effectiveness of this novel framework compared to state-of-the-art methods.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Perspective Chapter: MOOCs in India: Evolution, Innovation, Impact, and Roadmap
Authors:
Partha Pratim Das
Abstract:
With the largest population of the world and one of the highest enrolments in higher education, India needs efficient and effective means to educate its learners. India started focusing on open and digital education in 1980's and its efforts were escalated in 2009 through the NMEICT program of the Government of India. A study by the Government and FICCI in 2014 noted that India cannot meet its edu…
▽ More
With the largest population of the world and one of the highest enrolments in higher education, India needs efficient and effective means to educate its learners. India started focusing on open and digital education in 1980's and its efforts were escalated in 2009 through the NMEICT program of the Government of India. A study by the Government and FICCI in 2014 noted that India cannot meet its educational needs just by capacity building in brick and mortar institutions. It was decided that ongoing MOOCs projects under the umbrella of NMEICT will be further strengthened over its second (2017-21) and third (2021-26) phases. NMEICT now steers NPTEL or SWAYAM (India's MOOCs) and several digital learning projects including Virtual Labs, e-Yantra, Spoken Tutorial, FOSSEE, and National Digital Library on India - the largest digital education library in the world. Further, India embraced its new National Education Policy in 2020 to strongly foster online education. In this chapter, we take a deep look into the evolution of MOOCs in India, its innovations, its current status and impact, and the roadmap for the next decade to address its challenges and grow. AI-powered MOOCs is an emerging opportunity for India to lead MOOCs worldwide.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Online Authentication Habits of Indian Users
Authors:
Pratyush Choudhary,
Subhrajit Das,
Mukul Paras Potta,
Prasuj Das,
Abhishek Bichhawat
Abstract:
Passwords have been long used as the primary authentication method for web services. Weak passwords used by the users have prompted the use of password management tools and two-factor authentication to ensure better account security. While prior studies have studied their adoption individually, none of these studies focuses particularly on the Indian setting, which is culturally and economically d…
▽ More
Passwords have been long used as the primary authentication method for web services. Weak passwords used by the users have prompted the use of password management tools and two-factor authentication to ensure better account security. While prior studies have studied their adoption individually, none of these studies focuses particularly on the Indian setting, which is culturally and economically different from the countries in which these studies have been done in the past. To this end, we conducted a survey with 90 participants residing in India to better understand the mindset of people on using password managers and two-factor authentication (2FA).
Our findings suggest that a majority of the participants have used 2FA and password managers in some form, although they are sometimes unaware of their formal names. While many participants used some form of 2FA across all their accounts, browser-integrated and device-default password managers are predominantly utilized for less sensitive platforms such as e-commerce and social media rather than for more critical accounts like banking. The primary motivation for using password managers is the convenience of auto-filling. However, some participants avoid using password managers due to a lack of trust in these tools. Notably, dedicated third-party applications show low adoption for both password manager and 2FA.
Despite acknowledging the importance of secure password practices, many participants still reuse passwords across multiple accounts, prefer shorter passwords, and use commonly predictable password patterns. Overall, the study suggests that Indians are more inclined to choose default settings, underscoring the need for tailored strategies to improve user awareness and strengthen password security practices.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
A Quest for countable statistically characterized subgroups
Authors:
Pratulananda Das,
Ayan Ghosh,
Tamim Aziz
Abstract:
Very recently in [Das et al., Expo. Math., 2025], statistically characterized subgroups have been investigated for some non-arithmetic sequences, addressing certain cardinality-related questions. Building on this work, we investigate further and demonstrate that, for a particular class of non-arithmetic sequences, the statistically characterized subgroup coincides with the corresponding characteri…
▽ More
Very recently in [Das et al., Expo. Math., 2025], statistically characterized subgroups have been investigated for some non-arithmetic sequences, addressing certain cardinality-related questions. Building on this work, we investigate further and demonstrate that, for a particular class of non-arithmetic sequences, the statistically characterized subgroup coincides with the corresponding characterized subgroup. As a corollary, we identify a class of sequences for which statistically characterized subgroups are countably infinite. This result provides a negative solution to Problem 2.16 posed in [Das et al., Expo. Math., 2025] and Question 6.3 from [Dikranjan et al., Fund. Math., 2020]. Additionally, our findings resolve several open problems discussed in [Dikranjan et al., submitted]
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Hydrodynamic Equations for a system with translational and rotational dynamics
Authors:
Akira Yoshimori,
Shankar P. Das
Abstract:
We obtain the equations of fluctuating hydrodynamics for many-particle systems whose microscopic units have both translational and rotational motion. The orientational dynamics of each element are studied in terms of the rotational Brownian motion of a corresponding fixed-length director ${\bf u}$. The time evolution of a set of collective densities $\{\hatψ\}$ is obtained as an exact representati…
▽ More
We obtain the equations of fluctuating hydrodynamics for many-particle systems whose microscopic units have both translational and rotational motion. The orientational dynamics of each element are studied in terms of the rotational Brownian motion of a corresponding fixed-length director ${\bf u}$. The time evolution of a set of collective densities $\{\hatψ\}$ is obtained as an exact representation of the corresponding microscopic dynamics. For the Smoluchowski dynamics, noise in the Langevin equation for the director ${\bf u}$ is multiplicative. We obtain that the equation of motion for the collective number-density has two different forms, respectively, for the Ito and Stratonvich interpretation of the multiplicative noise in the ${\bf u}$-equation. Without the ${\bf u}$ variable, both reduce to the Standard Dean-Kawasaki form. Next, we average the microscopic equations for the collective densities $\{\hatψ\}$ (which are, at this stage, a collection of Dirac delta functions) over phase space variables and obtain a corresponding set of stochastic partial differential equations for the coarse-grained densities $\{ψ\}$ with smooth spatial and temporal dependence. The coarse-grained equations of motion for the collective densities $\{ψ\}$ constitute the fluctuating non-linear hydrodynamics for the fluid with both rotational and translational dynamics. From the stationary solution of the corresponding Fokker-Planck equation, we obtain a free energy functional ${\cal F}[ψ]$ and demonstrate the relation between the ${\cal F}[ψ]$s for different levels of the FNH descriptions with its corresponding set of $\{ψ\}$.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Manifestations of chaos in billiards: the role of mixed curvature
Authors:
Pranaya Pratik Das,
Tanmayee Patra,
Biplab Ganguli
Abstract:
The boundary of a billiard system plays a crucial role in shaping its dynamics, which may be integrable, mixed, or fully chaotic. When a boundary has varying curvature, it offers a unique setting to study the relation between classical chaos and quantum behaviour. In this study, we introduce two geometrically distinct billiards: a bean-shaped boundary and a peanut-shaped variant of Cassini ovals.…
▽ More
The boundary of a billiard system plays a crucial role in shaping its dynamics, which may be integrable, mixed, or fully chaotic. When a boundary has varying curvature, it offers a unique setting to study the relation between classical chaos and quantum behaviour. In this study, we introduce two geometrically distinct billiards: a bean-shaped boundary and a peanut-shaped variant of Cassini ovals. These systems incorporate both focusing and defocusing walls with no neutral segments. Our study reveals a strong correlation between classical and quantum dynamics. Our analysis of billiard flow diagrams confirms sensitivity to initial conditions(ICs)- a defining feature of chaos. Poincaré maps further show the phase space intricately woven with regions of chaotic motion and stability islands. Moving to the quantum domain, we employ nearest-neighbour spacing distribution and level spacing ratio as statistical measures to characterise chaos. Early time saturation in spectral complexity also supports an ergodic hierarchy in these systems. We observe a striking quantum phenomenon, i.e. eigenfunction scarring. This work bridges geometric boundary effects, classical hyperbolicity, and quantum ergodicity, offering a framework to engineer chaos in confined systems.
△ Less
Submitted 2 May, 2025; v1 submitted 15 January, 2025;
originally announced January 2025.
-
Position: Theory of Mind Benchmarks are Broken for Large Language Models
Authors:
Matthew Riemer,
Zahra Ashktorab,
Djallel Bouneffouf,
Payel Das,
Miao Liu,
Justin D. Weisz,
Murray Campbell
Abstract:
This position paper argues that the majority of theory of mind benchmarks are broken because of their inability to directly test how large language models (LLMs) adapt to new partners. This problem stems from the fact that theory of mind benchmarks for LLMs are overwhelmingly inspired by the methods used to test theory of mind in humans and fall victim to a fallacy of attributing human-like qualit…
▽ More
This position paper argues that the majority of theory of mind benchmarks are broken because of their inability to directly test how large language models (LLMs) adapt to new partners. This problem stems from the fact that theory of mind benchmarks for LLMs are overwhelmingly inspired by the methods used to test theory of mind in humans and fall victim to a fallacy of attributing human-like qualities to AI agents. We expect that humans will engage in a consistent reasoning process across various questions about a situation, but this is known to not be the case for current LLMs. Most theory of mind benchmarks only measure what we call literal theory of mind: the ability to predict the behavior of others. Measuring this kind of reasoning is very informative in testing the ability of agents with self-consistent reasoning. However, it is important to note the distinction between this and what we actually care about when this self-consistency cannot be taken for granted. We call this functional theory of mind: the ability to adapt to agents in-context following a rational response to predictions about their behavior. We find that top performing open source LLMs may display strong capabilities in literal theory of mind, depending on how they are prompted, but seem to struggle with functional theory of mind -- even when partner policies are exceedingly simple. Simply put, strong literal theory of mind performance does not necessarily imply strong functional theory of mind performance. Achieving functional theory of mind, particularly over long interaction horizons with a partner, is a significant challenge deserving a prominent role in any meaningful LLM theory of mind evaluation.
△ Less
Submitted 5 February, 2025; v1 submitted 27 December, 2024;
originally announced December 2024.
-
Revisiting the Inert Scalar Dark Matter with Vector-like Quarks
Authors:
Prasanta Kumar Das,
Shyamashish Dey,
Saumyen Kundu,
Santosh Kumar Rai
Abstract:
The inert doublet model (IDM), a minimal extension of the Standard Model (SM), provides a scalar dark matter (DM) candidate that belongs to the additional Higgs doublet. The model faces challenges in achieving the correct relic abundance for compressed spectra and DM masses in the high-mass range. In this work we introduce a $Z_2$-odd singlet vector-like quark (VLQ) into the IDM framework that hel…
▽ More
The inert doublet model (IDM), a minimal extension of the Standard Model (SM), provides a scalar dark matter (DM) candidate that belongs to the additional Higgs doublet. The model faces challenges in achieving the correct relic abundance for compressed spectra and DM masses in the high-mass range. In this work we introduce a $Z_2$-odd singlet vector-like quark (VLQ) into the IDM framework that helps us alleviate these issues and provide new channels of contributions to the relic abundance. The VLQ not only enhances the DM relic abundance for masses above $~550$ GeV but also eases constraints from direct detection experiments by enabling smaller couplings between the inert scalars and the SM Higgs. We analyze the impact of the VLQ on DM phenomenology, including relic density, direct and indirect detection constraints. The results demonstrate that the extended IDM framework not only resolves existing limitations in the compressed spectrum but also offers exciting prospects for detection in current and future collider experiments.
△ Less
Submitted 17 May, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
Optimizing FTQC Programs through QEC Transpiler and Architecture Codesign
Authors:
Meng Wang,
Chenxu Liu,
Samuel Stein,
Yufei Ding,
Poulami Das,
Prashant J. Nair,
Ang Li
Abstract:
Fault-tolerant quantum computing (FTQC) is essential for executing reliable quantum computations of meaningful scale. Widely adopted QEC codes for FTQC, such as the surface code and color codes, utilize Clifford+T gate sets, where T gates are generally considered as the primary bottleneck due to their high resource costs. Recent advances in T gate optimization have significantly reduced this overh…
▽ More
Fault-tolerant quantum computing (FTQC) is essential for executing reliable quantum computations of meaningful scale. Widely adopted QEC codes for FTQC, such as the surface code and color codes, utilize Clifford+T gate sets, where T gates are generally considered as the primary bottleneck due to their high resource costs. Recent advances in T gate optimization have significantly reduced this overhead, making Clifford gate complexity an increasingly critical bottleneck that remains largely unaddressed in present FTQC compiler and architecture designs. To address this new bottleneck, this paper introduces TACO, a \textbf{T}ranspiler-\textbf{A}rchitecture \textbf{C}odesign \textbf{O}ptimization framework, to reduce Clifford cost. Specifically, we observe that, through codesign, insights rooted in the FTQC architecture can inform novel circuit-level optimizations for FTQC compilers. These optimizations, in turn, provide new opportunities to redesign and improve the underlying architecture. Evaluations show that TACO achieves an average 91.7% reduction in Clifford gates across diverse quantum circuits and significantly enhances gate parallelism compared to Pauli-based approaches. These improvements enable an efficient FTQC architecture that can achieve single-gate-per-cycle throughput using only $1.5n+4$ logical qubit tiles, considerably pushing forward upon previously proposed designs that require $2n+\sqrt{8n}+1$ tiles. These results highlight the benefits of bidirectional optimization through codesign. TACO will be open-source.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Evidence for Local Symmetry Breaking in the Skyrmion-Hosting Ni2In-type Hexagonal Compounds
Authors:
Anupam K. Singh,
Sanjay Singh,
Krishna K. Dubey,
Parul Devi,
Pritam Das,
Martin Etter,
Ola. G. Grendal,
Catherine Dejoie,
Andrew Fitch,
Anatoliy Senyshyn,
Seung-Cheol Lee,
Satadeep Bhattacharjee,
Dhananjai Pandey
Abstract:
Dzyaloshinskii-Moriya interaction (DMI) plays a crucial role to stabilize the exotic topologically stable skyrmion spin-textures in the noncentrosymmetric crystals. The recent discovery of biskyrmions and skyrmions in the globally centrosymmetric crystals has raised debate about the role of the DMI in causing the spin textures, since DMI vanishes in such crystal structures. Theoretical studies, on…
▽ More
Dzyaloshinskii-Moriya interaction (DMI) plays a crucial role to stabilize the exotic topologically stable skyrmion spin-textures in the noncentrosymmetric crystals. The recent discovery of biskyrmions and skyrmions in the globally centrosymmetric crystals has raised debate about the role of the DMI in causing the spin textures, since DMI vanishes in such crystal structures. Theoretical studies, on the other hand, suggest non-vanishing DMI even if there is local inversion symmetry breaking in an otherwise globally centrosymmetric crystal structure. Motivated by such theoretical predictions, we present here the results of a systematic crystal structure study of two skyrmion-hosting Ni2In-type centrosymmetric hexagonal compounds, MnNiGa and MnPtGa, using the atomic pair distribution function (PDF) technique. Our result provides information about structural correlations in the short-range (SR), medium-range (MR) and long-range (LR) regimes simultaneously. The analysis of the experimental PDFs, obtained from high flux, high energy and high-Q synchrotron x-ray powder diffraction patterns, reveal that the local SR structure of both MnNiGa and MnPtGa compounds corresponds to the noncentrosymmetric trigonal space group P3m1, while the structure in the MR+LR regimes remains hexagonal in the centrosymmetric P63/mmc space group. These findings are also supported by theoretical DFT calculations. Our results in conjunction with the previous theoretical predictions, provide a rationale for the genesis of skyrmions in centrosymmetric materials in terms of non-vanishing DMI due to local inversion symmetry breaking. We believe that our findings would encourage a systematic search of skyrmionic textures and other topological phenomena in a vast family of centrosymmetric materials.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
SHAPE -- A Spectro-Polarimeter Onboard Propulsion Module of Chandrayaan-3 Mission
Authors:
Anuj Nandi,
Swapnil Singh,
Bhavesh Jaiswal,
Anand Jain,
Smrati Verma,
Reenu Palawat,
Ravishankar B. T.,
Brajpal Singh,
Anurag Tyagi,
Priyanka Das,
Supratik Bose,
Supriya Verma,
Waghmare Rahul Gautam,
Yogesh Prasad K. R.,
Bijoy Raha,
Bhavesh Mendhekar,
Sathyanaryana Raju K.,
Srinivasa Rao Kondapi V.,
Sumit Kumar,
Mukund Kumar Thakur,
Vinti Bhatia,
Nidhi Sharma,
Govinda Rao Yenni,
Neeraj Kumar Satya,
Venkata Raghavendra
, et al. (9 additional authors not shown)
Abstract:
SHAPE (Spectro-polarimetry of HAbitable Planet Earth) is an experiment onboard the Chandrayaan-3 Mission, designed to study the spectro-polarimetric signatures of the habitable planet Earth in the near-infrared (NIR) wavelength range (1.0 - 1.7 $μ$m). The spectro-polarimeter is the only scientific payload (experimental in nature) on the Propulsion Module (PM) of the Chandrayaan-3 mission. The inst…
▽ More
SHAPE (Spectro-polarimetry of HAbitable Planet Earth) is an experiment onboard the Chandrayaan-3 Mission, designed to study the spectro-polarimetric signatures of the habitable planet Earth in the near-infrared (NIR) wavelength range (1.0 - 1.7 $μ$m). The spectro-polarimeter is the only scientific payload (experimental in nature) on the Propulsion Module (PM) of the Chandrayaan-3 mission. The instrument is a compact and lightweight spectro-polarimeter with an Acousto-Optic Tunable Filter (AOTF) at its core. The AOTF operates in the frequency range of 80 MHz to 135 MHz with a power of 0.5 - 2.0 Watts. The two output beams (e-beam and o-beam) from the AOTF are focused onto two InGaAs detectors (pixelated, 1D linear array) with the help of focusing optics. The primary (aperture) optics, with a diameter of $\sim$2 mm, collects the NIR light for input to the AOTF, defining the field of view (FOV) of 2.6$^\circ$. The payload has a mass of 4.8 kg and operates at a power of 25 Watts. This manuscript highlights some of the ground-based results, including the post-launch initial performance of the payload while orbiting around the Moon to observe Earth.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
B-MASTER: Scalable Bayesian Multivariate Regression Analysis for Selecting Targeted Essential Regressors to Identify the Key Genera in Microbiome-Metabolite Relation Dynamics
Authors:
Priyam Das,
Tanujit Dey,
Christine Peterson,
Sounak Chakraborty
Abstract:
We introduce B-MASTER (Bayesian Multivariate regression Analysis for Selecting Targeted Essential Regressors), a fully Bayesian framework for scalable multivariate regression in high dimensions. B-MASTER is designed to identify master predictors, i.e., covariates exerting widespread influence across many outcomes, via a hybrid penalty: an L1 penalty induces elementwise sparsity, while an L2 penalt…
▽ More
We introduce B-MASTER (Bayesian Multivariate regression Analysis for Selecting Targeted Essential Regressors), a fully Bayesian framework for scalable multivariate regression in high dimensions. B-MASTER is designed to identify master predictors, i.e., covariates exerting widespread influence across many outcomes, via a hybrid penalty: an L1 penalty induces elementwise sparsity, while an L2 penalty enforces groupwise shrinkage across rows of the coefficient matrix. This structure selects a parsimonious set of key covariates, enhancing interpretability. A tailored Gibbs sampler achieves scalability, with runtime growing linearly in parameter dimension and remaining stable across sample sizes; full posterior inference is feasible for models with up to four million parameters. We establish posterior consistency and contraction rate results, showing that B-MASTER concentrates around the truth at the minimax-optimal rate under sparsity. These theoretical guarantees are supported by strong empirical performance; in simulations, B-MASTER outperforms competing methods in estimation and signal recovery. Applied to microbiome-metabolomics data from colorectal cancer patients, B-MASTER reveals microbial genera that shape broad metabolite profiles, uncovering relationships missed by other methods. The proposed approach is principled, interpretable, and scalable for discovering systemic patterns in ultra-high-dimensional biomedical data.
△ Less
Submitted 27 May, 2025; v1 submitted 8 December, 2024;
originally announced December 2024.