-
Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
Authors:
Rishikesh Srinivasan,
Dheeraj Nagaraj
Abstract:
We study the problem of sampling from strongly log-concave distributions over $\mathbb{R}^d$ using the Poisson midpoint discretization (a variant of the randomized midpoint method) for overdamped/underdamped Langevin dynamics. We prove its convergence in the 2-Wasserstein distance ($W_2$), achieving a cubic speedup in dependence on the target accuracy ($ε$) over the Euler-Maruyama discretization,…
▽ More
We study the problem of sampling from strongly log-concave distributions over $\mathbb{R}^d$ using the Poisson midpoint discretization (a variant of the randomized midpoint method) for overdamped/underdamped Langevin dynamics. We prove its convergence in the 2-Wasserstein distance ($W_2$), achieving a cubic speedup in dependence on the target accuracy ($ε$) over the Euler-Maruyama discretization, surpassing existing bounds for randomized midpoint methods. Notably, in the case of underdamped Langevin dynamics, we demonstrate the complexity of $W_2$ convergence is much smaller than the complexity lower bounds for convergence in $L^2$ strong error established in the literature.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Compression, simulation, and synthesis of turbulent flows with tensor trains
Authors:
Stefano Pisoni,
Raghavendra Dheeraj Peddinti,
Egor Tiunov,
Siddhartha E. Guzman,
Leandro Aolita
Abstract:
Numerical simulations of turbulent fluids are paramount to real-life applications, from predicting and modeling flows to diagnostic purposes in engineering. However, they are also computationally challenging due to their intrinsically non-linear dynamics, which requires a very high spatial resolution to accurately describe them. A promising idea is to represent flows on a discrete mesh using tenso…
▽ More
Numerical simulations of turbulent fluids are paramount to real-life applications, from predicting and modeling flows to diagnostic purposes in engineering. However, they are also computationally challenging due to their intrinsically non-linear dynamics, which requires a very high spatial resolution to accurately describe them. A promising idea is to represent flows on a discrete mesh using tensor trains (TTs), featuring a convenient scaling of the number of parameters with the mesh size. However, it is yet not clear how the compression power of TTs is affected by the complexity of the flows, measured by the Reynolds number. In fact, no TT fluid solver has been extensively validated in a fully developed turbulent regime yet. We fill this gap. We conduct a comprehensive analysis of TTs as an Ansatz to compress, simulate, and synthetically generate fiducial turbulent snapshots in 3D. Specifically, first, we exhaustively investigate the effect of TT compression of given snapshots on key turbulence signatures, including the energy spectrum and different accuracy metrics. Second, we present a TT solver to simulate time evolution of 3D fluid fields according to the incompressible Navier-Stokes equations entirely within the compressed representation. Third, we develop a TT algorithm to generate artificial snapshots displaying all the signatures of turbulence. In all three cases, a number of parameters scaling polylogarithmically with the mesh size is enough for accurate descriptions. Our findings confirm that fluids in truly turbulent regimes admit an efficient TT description and offer a powerful, quantum-inspired toolkit for their computational treatment.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Technical report on a quantum-inspired solver for simulating compressible flows
Authors:
Raghavendra Dheeraj Peddinti,
Stefano Pisoni,
Egor Tiunov,
Alessandro Marini,
Leandro Aolita
Abstract:
This document presents a quantum-inspired solver for 2D Euler equations, accepted at the final phase of the Airbus-BWM Group Quantum Computing Challenge (ABQCC) 2024. We tackle the case study of Quantum Solvers for Predictive Aeroacoustic and Aerodynamic modeling tasks. We propose a tensor network based solver that scales polylogarithmically with the mesh size, in both runtime and memory. This pro…
▽ More
This document presents a quantum-inspired solver for 2D Euler equations, accepted at the final phase of the Airbus-BWM Group Quantum Computing Challenge (ABQCC) 2024. We tackle the case study of Quantum Solvers for Predictive Aeroacoustic and Aerodynamic modeling tasks. We propose a tensor network based solver that scales polylogarithmically with the mesh size, in both runtime and memory. This provides a promising avenue for tackling the curse of dimensionality that plagues the direct numerical simulations in the field of computational fluid dynamics.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
SportMamba: Adaptive Non-Linear Multi-Object Tracking with State Space Models for Team Sports
Authors:
Dheeraj Khanna,
Jerrin Bright,
Yuhao Chen,
John S. Zelek
Abstract:
Multi-object tracking (MOT) in team sports is particularly challenging due to the fast-paced motion and frequent occlusions resulting in motion blur and identity switches, respectively. Predicting player positions in such scenarios is particularly difficult due to the observed highly non-linear motion patterns. Current methods are heavily reliant on object detection and appearance-based tracking,…
▽ More
Multi-object tracking (MOT) in team sports is particularly challenging due to the fast-paced motion and frequent occlusions resulting in motion blur and identity switches, respectively. Predicting player positions in such scenarios is particularly difficult due to the observed highly non-linear motion patterns. Current methods are heavily reliant on object detection and appearance-based tracking, which struggle to perform in complex team sports scenarios, where appearance cues are ambiguous and motion patterns do not necessarily follow a linear pattern. To address these challenges, we introduce SportMamba, an adaptive hybrid MOT technique specifically designed for tracking in dynamic team sports. The technical contribution of SportMamba is twofold. First, we introduce a mamba-attention mechanism that models non-linear motion by implicitly focusing on relevant embedding dependencies. Second, we propose a height-adaptive spatial association metric to reduce ID switches caused by partial occlusions by accounting for scale variations due to depth changes. Additionally, we extend the detection search space with adaptive buffers to improve associations in fast-motion scenarios. Our proposed technique, SportMamba, demonstrates state-of-the-art performance on various metrics in the SportsMOT dataset, which is characterized by complex motion and severe occlusion. Furthermore, we demonstrate its generalization capability through zero-shot transfer to VIP-HTD, an ice hockey dataset.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
ThinkTank: A Framework for Generalizing Domain-Specific AI Agent Systems into Universal Collaborative Intelligence Platforms
Authors:
Praneet Sai Madhu Surabhi,
Dheeraj Reddy Mudireddy,
Jian Tao
Abstract:
This paper presents ThinkTank, a comprehensive and scalable framework designed to transform specialized AI agent systems into versatile collaborative intelligence platforms capable of supporting complex problem-solving across diverse domains. ThinkTank systematically generalizes agent roles, meeting structures, and knowledge integration mechanisms by adapting proven scientific collaboration method…
▽ More
This paper presents ThinkTank, a comprehensive and scalable framework designed to transform specialized AI agent systems into versatile collaborative intelligence platforms capable of supporting complex problem-solving across diverse domains. ThinkTank systematically generalizes agent roles, meeting structures, and knowledge integration mechanisms by adapting proven scientific collaboration methodologies. Through role abstraction, generalization of meeting types for iterative collaboration, and the integration of Retrieval-Augmented Generation with advanced knowledge storage, the framework facilitates expertise creation and robust knowledge sharing. ThinkTank enables organizations to leverage collaborative AI for knowledge-intensive tasks while ensuring data privacy and security through local deployment, utilizing frameworks like Ollama with models such as Llama3.1. The ThinkTank framework is designed to deliver significant advantages in cost-effectiveness, data security, scalability, and competitive positioning compared to cloud-based alternatives, establishing it as a universal platform for AI-driven collaborative problem-solving. The ThinkTank code is available at https://github.com/taugroup/ThinkTank
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning
Authors:
Esra Adiyeke,
Tianqi Liu,
Venkata Sai Dheeraj Naganaboina,
Han Li,
Tyler J. Loftus,
Yuanfang Ren,
Benjamin Shickel,
Matthew M. Ruppert,
Karandeep Singh,
Ruogu Fang,
Parisa Rashidi,
Azra Bihorac,
Tezcan Ozrazgat-Baslanti
Abstract:
Traditional methods of surgical decision making heavily rely on human experience and prompt actions, which are variable. A data-driven system generating treatment recommendations based on patient states can be a substantial asset in perioperative decision-making, as in cases of intraoperative hypotension, for which suboptimal management is associated with acute kidney injury (AKI), a common and mo…
▽ More
Traditional methods of surgical decision making heavily rely on human experience and prompt actions, which are variable. A data-driven system generating treatment recommendations based on patient states can be a substantial asset in perioperative decision-making, as in cases of intraoperative hypotension, for which suboptimal management is associated with acute kidney injury (AKI), a common and morbid postoperative complication. We developed a Reinforcement Learning (RL) model to recommend optimum dose of intravenous (IV) fluid and vasopressors during surgery to avoid intraoperative hypotension and postoperative AKI. We retrospectively analyzed 50,021 surgeries from 42,547 adult patients who underwent major surgery at a quaternary care hospital between June 2014 and September 2020. Of these, 34,186 surgeries were used for model training and 15,835 surgeries were reserved for testing. We developed a Deep Q-Networks based RL model using 16 variables including intraoperative physiologic time series, total dose of IV fluid and vasopressors extracted for every 15-minute epoch. The model replicated 69% of physician's decisions for the dosage of vasopressors and proposed higher or lower dosage of vasopressors than received in 10% and 21% of the treatments, respectively. In terms of IV fluids, the model's recommendations were within 0.05 ml/kg/15 min of the actual dose in 41% of the cases, with higher or lower doses recommended for 27% and 32% of the treatments, respectively. The model resulted in a higher estimated policy value compared to the physicians' actual treatments, as well as random and zero-drug policies. AKI prevalence was the lowest in patients receiving medication dosages that aligned with model's decisions. Our findings suggest that implementation of the model's policy has the potential to reduce postoperative AKI and improve other outcomes driven by intraoperative hypotension.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Adaptive Estimation and Learning under Temporal Distribution Shift
Authors:
Dheeraj Baby,
Yifei Tang,
Hieu Duy Nguyen,
Yu-Xiang Wang,
Rohit Pyati
Abstract:
In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length $n$, which is a noisy realization of a time-varying groundtruth sequence. Our focus is to develop methods to estimate the groundtruth at the final time-step while providing sharp point-wise estimation error rates. We show that, without prior knowledge on the…
▽ More
In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length $n$, which is a noisy realization of a time-varying groundtruth sequence. Our focus is to develop methods to estimate the groundtruth at the final time-step while providing sharp point-wise estimation error rates. We show that, without prior knowledge on the level of temporal shift, a wavelet soft-thresholding estimator provides an optimal estimation error bound for the groundtruth. Our proposed estimation method generalizes existing researches Mazzetto and Upfal (2023) by establishing a connection between the sequence's non-stationarity level and the sparsity in the wavelet-transformed domain. Our theoretical findings are validated by numerical experiments. Additionally, we applied the estimator to derive sparsity-aware excess risk bounds for binary classification under distribution shift and to develop computationally efficient training objectives. As a final contribution, we draw parallels between our results and the classical signal processing problem of total-variation denoising (Mammen and van de Geer,1997; Tibshirani, 2014), uncovering novel optimal algorithms for such task.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Dilatation-driven spurious dissipation in weakly compressible methods
Authors:
Dheeraj Raghunathan,
Y. Sudhakar
Abstract:
The weakly compressible methods to simulate incompressible flows are in a state of rapid development, owing to the envisaged efficiency they offer for parallel computing. The pressure waves in such methods travel at finite speeds, and hence they yield non-solenoidal velocity fields. This inherent inability to satisfy mass conservation corresponding to incompressible flows is a crucial concern for…
▽ More
The weakly compressible methods to simulate incompressible flows are in a state of rapid development, owing to the envisaged efficiency they offer for parallel computing. The pressure waves in such methods travel at finite speeds, and hence they yield non-solenoidal velocity fields. This inherent inability to satisfy mass conservation corresponding to incompressible flows is a crucial concern for weakly compressible methods. Another widely reported observation is the progressive enhancement of non-physical dissipation with the increase in the artificial compressibility parameter. By scrutinizing the dilatation terms appearing in the kinetic energy equation, we provide vital insights into the influence of mass conservation error on the accuracy of these methods, and explain the mechanism behind the dissipative nature of the compressibility. Analysing transient laminar and turbulent flows, we show that the dilatation-driven dissipation terms, not the mass conservation error alone, govern the accuracy of weakly compressible methods. The insights provided in this work are not only of fundamental importance but will be of considerable value in aiding the development of weakly compressible methods that can allow a larger artificial Mach number, thus alleviating the stringent time step restriction in such methods.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
The Tenth NTIRE 2025 Image Denoising Challenge Report
Authors:
Lei Sun,
Hang Guo,
Bin Ren,
Luc Van Gool,
Radu Timofte,
Yawei Li,
Xiangyu Kong,
Hyunhee Park,
Xiaoxuan Yu,
Suejin Han,
Hakjae Jeon,
Jia Li,
Hyung-Ju Chun,
Donghun Ryou,
Inju Ha,
Bohyung Han,
Jingyu Ma,
Zhijuan Huang,
Huiyuan Fu,
Hongyuan Yu,
Boqi Zhang,
Jiawei Shi,
Heng Zhang,
Huadong Ma,
Deepak Kumar Tyagi
, et al. (69 additional authors not shown)
Abstract:
This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent ad…
▽ More
This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent additive white Gaussian noise (AWGN) with a fixed noise level of 50. A total of 290 participants registered for the challenge, with 20 teams successfully submitting valid results, providing insights into the current state-of-the-art in image denoising.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Adapting to Online Distribution Shifts in Deep Learning: A Black-Box Approach
Authors:
Dheeraj Baby,
Boran Han,
Shuai Zhang,
Cuixiong Hu,
Yuyang Wang,
Yu-Xiang Wang
Abstract:
We study the well-motivated problem of online distribution shift in which the data arrive in batches and the distribution of each batch can change arbitrarily over time. Since the shifts can be large or small, abrupt or gradual, the length of the relevant historical data to learn from may vary over time, which poses a major challenge in designing algorithms that can automatically adapt to the best…
▽ More
We study the well-motivated problem of online distribution shift in which the data arrive in batches and the distribution of each batch can change arbitrarily over time. Since the shifts can be large or small, abrupt or gradual, the length of the relevant historical data to learn from may vary over time, which poses a major challenge in designing algorithms that can automatically adapt to the best ``attention span'' while remaining computationally efficient. We propose a meta-algorithm that takes any network architecture and any Online Learner (OL) algorithm as input and produces a new algorithm which provably enhances the performance of the given OL under non-stationarity. Our algorithm is efficient (it requires maintaining only $O(\log(T))$ OL instances) and adaptive (it automatically chooses OL instances with the ideal ``attention'' length at every timestamp). Experiments on various real-world datasets across text and image modalities show that our method consistently improves the accuracy of user specified OL algorithms for classification tasks. Key novel algorithmic ingredients include a \emph{multi-resolution instance} design inspired by wavelet theory and a cross-validation-through-time technique. Both could be of independent interest.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
On Contact Round Surgeries on $(\mathbb{S}^3,ξ_{st})$ and Their Diagrams
Authors:
Prerak Deep,
Dheeraj Kulkarni
Abstract:
We introduce the notions of contact round surgery of index 1 and 2, respectively, on Legendrian knots in $\left(\mathbb{S}^3, ξ_{st}\right)$ and associate diagrams to them. We realize Jiro Adachi's contact round surgeries as special cases. We show that every closed connected contact 3-manifold can be obtained by performing a sequence of contact round surgeries on some Legendrian link in…
▽ More
We introduce the notions of contact round surgery of index 1 and 2, respectively, on Legendrian knots in $\left(\mathbb{S}^3, ξ_{st}\right)$ and associate diagrams to them. We realize Jiro Adachi's contact round surgeries as special cases. We show that every closed connected contact 3-manifold can be obtained by performing a sequence of contact round surgeries on some Legendrian link in $\left(\mathbb{S}^3, ξ_{st}\right)$, thus obtaining a contact round surgery diagram for each contact 3-manifold. This is analogous to a similar result of Ding and Geiges for contact Dehn surgeries. We discuss a bridge between certain pairs of contact round surgery diagrams of index 1 and 2 and contact $\pm1$-surgery diagrams. We use this bridge to establish the result mentioned above.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Steering off Course: Reliability Challenges in Steering Language Models
Authors:
Patrick Queiroz Da Silva,
Hari Sethuraman,
Dheeraj Rajagopal,
Hannaneh Hajishirzi,
Sachin Kumar
Abstract:
Steering methods for language models (LMs) have gained traction as lightweight alternatives to fine-tuning, enabling targeted modifications to model activations. However, prior studies primarily report results on a few models, leaving critical gaps in understanding the robustness of these methods. In this work, we systematically examine three prominent steering methods -- DoLa, function vectors, a…
▽ More
Steering methods for language models (LMs) have gained traction as lightweight alternatives to fine-tuning, enabling targeted modifications to model activations. However, prior studies primarily report results on a few models, leaving critical gaps in understanding the robustness of these methods. In this work, we systematically examine three prominent steering methods -- DoLa, function vectors, and task vectors. In contrast to the original studies, which evaluated a handful of models, we test up to 36 models belonging to 14 families with sizes ranging from 1.5B to 70B parameters. Our experiments reveal substantial variability in the effectiveness of the steering approaches, with a large number of models showing no improvement and at times degradation in steering performance. Our analysis demonstrate fundamental flaws in the assumptions underlying these methods, challenging their reliability as scalable steering solutions.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
SLACK: Attacking LiDAR-based SLAM with Adversarial Point Injections
Authors:
Prashant Kumar,
Dheeraj Vattikonda,
Kshitij Madhav Bhat,
Kunal Dargan,
Prem Kalra
Abstract:
The widespread adoption of learning-based methods for the LiDAR makes autonomous vehicles vulnerable to adversarial attacks through adversarial \textit{point injections (PiJ)}. It poses serious security challenges for navigation and map generation. Despite its critical nature, no major work exists that studies learning-based attacks on LiDAR-based SLAM. Our work proposes SLACK, an end-to-end deep…
▽ More
The widespread adoption of learning-based methods for the LiDAR makes autonomous vehicles vulnerable to adversarial attacks through adversarial \textit{point injections (PiJ)}. It poses serious security challenges for navigation and map generation. Despite its critical nature, no major work exists that studies learning-based attacks on LiDAR-based SLAM. Our work proposes SLACK, an end-to-end deep generative adversarial model to attack LiDAR scans with several point injections without deteriorating LiDAR quality. To facilitate SLACK, we design a novel yet simple autoencoder that augments contrastive learning with segmentation-based attention for precise reconstructions. SLACK demonstrates superior performance on the task of \textit{point injections (PiJ)} compared to the best baselines on KITTI and CARLA-64 dataset while maintaining accurate scan quality. We qualitatively and quantitatively demonstrate PiJ attacks using a fraction of LiDAR points. It severely degrades navigation and map quality without deteriorating the LiDAR scan quality.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Beyond Propagation of Chaos: A Stochastic Algorithm for Mean Field Optimization
Authors:
Chandan Tankala,
Dheeraj M. Nagaraj,
Anant Raj
Abstract:
Gradient flow in the 2-Wasserstein space is widely used to optimize functionals over probability distributions and is typically implemented using an interacting particle system with $n$ particles. Analyzing these algorithms requires showing (a) that the finite-particle system converges and/or (b) that the resultant empirical distribution of the particles closely approximates the optimal distributi…
▽ More
Gradient flow in the 2-Wasserstein space is widely used to optimize functionals over probability distributions and is typically implemented using an interacting particle system with $n$ particles. Analyzing these algorithms requires showing (a) that the finite-particle system converges and/or (b) that the resultant empirical distribution of the particles closely approximates the optimal distribution (i.e., propagation of chaos). However, establishing efficient sufficient conditions can be challenging, as the finite particle system may produce heavily dependent random variables.
In this work, we study the virtual particle stochastic approximation, originally introduced for Stein Variational Gradient Descent. This method can be viewed as a form of stochastic gradient descent in the Wasserstein space and can be implemented efficiently. In popular settings, we demonstrate that our algorithm's output converges to the optimal distribution under conditions similar to those for the infinite particle limit, and it produces i.i.d. samples without the need to explicitly establish propagation of chaos bounds.
△ Less
Submitted 17 June, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences
Authors:
Adnan Shahid,
Adrian Kliks,
Ahmed Al-Tahmeesschi,
Ahmed Elbakary,
Alexandros Nikou,
Ali Maatouk,
Ali Mokh,
Amirreza Kazemi,
Antonio De Domenico,
Athanasios Karapantelakis,
Bo Cheng,
Bo Yang,
Bohao Wang,
Carlo Fischione,
Chao Zhang,
Chaouki Ben Issaid,
Chau Yuen,
Chenghui Peng,
Chongwen Huang,
Christina Chaccour,
Christo Kurisummoottil Thomas,
Dheeraj Sharma,
Dimitris Kalogiros,
Dusit Niyato,
Eli De Poorter
, et al. (110 additional authors not shown)
Abstract:
This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced b…
▽ More
This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Insight into interplay between bandstructure and Coulomb interaction via quasiparticle interference
Authors:
Garima Goyal,
Dheeraj Kumar Singh
Abstract:
Quasiparticle interference has been used frequently for the purpose of unraveling the electronic states in the vicinity of the Fermi level as well as the nature of superconducting gap in the unconventional superconductors. Using the metallic spin-density wave state of iron pnictides as an example, we demonstrate that the quasiparticle interference can also be used as a probe to provide crucial ins…
▽ More
Quasiparticle interference has been used frequently for the purpose of unraveling the electronic states in the vicinity of the Fermi level as well as the nature of superconducting gap in the unconventional superconductors. Using the metallic spin-density wave state of iron pnictides as an example, we demonstrate that the quasiparticle interference can also be used as a probe to provide crucial insight into the interplay of the electronic bandstructure and correlation effects in addition to bringing forth the essential features of electronic states in the vicinity of the Fermi level. Our study reveals that the features of quasiparticle interference pattern can help us narrowing down the interaction parameter window and choose a more realistic tight-binding model.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Geometrical subordinated Poisson processes and its extensions
Authors:
Neha Gupta,
Aditya Maheshwari,
Dheeraj Goyal
Abstract:
In this paper, we study a generalized version of the Poisson-type process by time-changing it with the geometric counting process. Our work generalizes the work done by Meoli (2023) \cite{meoli2023some}. We defined the geometric subordinated Poisson process (GSPP), the geometric subordinated compound Poisson process (GSCPP) and the geometric subordinated multiplicative Poisson process (GSMPP) by t…
▽ More
In this paper, we study a generalized version of the Poisson-type process by time-changing it with the geometric counting process. Our work generalizes the work done by Meoli (2023) \cite{meoli2023some}. We defined the geometric subordinated Poisson process (GSPP), the geometric subordinated compound Poisson process (GSCPP) and the geometric subordinated multiplicative Poisson process (GSMPP) by time-changing the subordinated Poisson process, subordinated compound Poisson process and subordinated multiplicative Poisson process with the geometric counting process, respectively. We derived several distributional properties and many special cases from the above-mentioned processes. We calculate the asymptotic behavior of the correlation structure. We have discussed applications of time-changed generalized compound Poisson in shock modelling.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Interleaved Gibbs Diffusion for Constrained Generation
Authors:
Gautham Govind Anil,
Sachin Yadav,
Dheeraj Nagaraj,
Karthikeyan Shanmugam,
Prateek Jain
Abstract:
We introduce Interleaved Gibbs Diffusion (IGD), a novel generative modeling framework for mixed continuous-discrete data, focusing on constrained generation problems. Prior works on discrete and continuous-discrete diffusion models assume factorized denoising distribution for fast generation, which can hinder the modeling of strong dependencies between random variables encountered in constrained g…
▽ More
We introduce Interleaved Gibbs Diffusion (IGD), a novel generative modeling framework for mixed continuous-discrete data, focusing on constrained generation problems. Prior works on discrete and continuous-discrete diffusion models assume factorized denoising distribution for fast generation, which can hinder the modeling of strong dependencies between random variables encountered in constrained generation. IGD moves beyond this by interleaving continuous and discrete denoising algorithms via a discrete time Gibbs sampling type Markov chain. IGD provides flexibility in the choice of denoisers, allows conditional generation via state-space doubling and inference time scaling via the ReDeNoise method. Empirical evaluations on three challenging tasks-solving 3-SAT, generating molecule structures, and generating layouts-demonstrate state-of-the-art performance. Notably, IGD achieves a 7% improvement on 3-SAT out of the box and achieves state-of-the-art results in molecule generation without relying on equivariant diffusion or domain-specific architectures. We explore a wide range of modeling, and interleaving strategies along with hyperparameters in each of these problems.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Using Infrared Dust Echoes to Identify Bright Quasi-periodic Eruption Sources
Authors:
Dheeraj R. Pasham,
Eric Coughlin,
Sjoert van Velzen,
Jason Hinkle
Abstract:
Quasi-periodic eruptions (QPEs) are recurring soft X-ray outbursts from galactic nuclei and represent an intriguing new class of transients. Currently, 10 QPE sources are reported in the literature, and a major challenge lies in identifying more because they are (apparently) intrinsically and exclusively X-ray bright. Here we highlight the unusual infrared (IR) echo of the tidal disruption event (…
▽ More
Quasi-periodic eruptions (QPEs) are recurring soft X-ray outbursts from galactic nuclei and represent an intriguing new class of transients. Currently, 10 QPE sources are reported in the literature, and a major challenge lies in identifying more because they are (apparently) intrinsically and exclusively X-ray bright. Here we highlight the unusual infrared (IR) echo of the tidal disruption event (TDE) -- and subsequent QPE source -- AT2019qiz, which rose continuously and approximately linearly with time over roughly 1000 days (between 2019 and 2024). We argue that this continuous long rise alongside the relatively high inferred IR temperature (800-1200 K) cannot be generated by the TDE itself, including the late-time/remnant TDE disk, but that the reprocessing of the light from the QPEs by a shell of dust can reproduce the observations. This model predicts 1) IR QPEs at the 0.1 percent level that are potentially detectable with the James Webb Space Telescope, and 2) that if the QPEs cease in AT2019qiz, the IR light curve should decline steadily and linearly over the same 1000-day timescale. We identify another TDE with similar IR behavior, AT2020ysg, which could thus harbor QPEs. Our findings and inferences constitute a novel method for identifying ``bright'' QPEs (with peak bolometric luminosities $\gtrsim$10$^{44}$ erg/sec), i.e., that the follow-up of optically selected TDEs with wide-field infrared surveys can indirectly reveal the presence of QPEs. This approach could be particularly effective with the upcoming Roman telescope, which could detect dozens of QPE candidates for high-cadence X-ray follow-up.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Dimension-free Score Matching and Time Bootstrapping for Diffusion Models
Authors:
Syamantak Kumar,
Dheeraj Nagaraj,
Purnamrita Sarkar
Abstract:
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels. The model is trained using samples drawn from the target distribution, progressively adding noise. In this work, we establish the first (nearly) dimension-free sample complexity bounds for learning these score functions, achieving a double exponential improvement in dimension over…
▽ More
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels. The model is trained using samples drawn from the target distribution, progressively adding noise. In this work, we establish the first (nearly) dimension-free sample complexity bounds for learning these score functions, achieving a double exponential improvement in dimension over prior results. A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels, a critical feature of diffusion models in practice which enables generalization across timesteps. Our analysis introduces a novel martingale-based error decomposition and sharp variance bounds, enabling efficient learning from dependent data generated by Markov processes, which may be of independent interest. Building on these insights, we propose Bootstrapped Score Matching (BSM), a variance reduction technique that utilizes previously learned scores to improve accuracy at higher noise levels. These results provide crucial insights into the efficiency and effectiveness of diffusion models for generative modeling.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Multidisciplinary Science in the Multimessenger Era
Authors:
Eric Burns,
Christopher L. Fryer,
Ivan Agullo,
Jennifer Andrews,
Elias Aydi,
Matthew G. Baring,
Eddie Baron,
Peter G. Boorman,
Mohammad Ali Boroumand,
Eric Borowski,
Floor S. Broekgaarden,
Poonam Chandra,
Emmanouil Chatzopoulos,
Hsin-Yu Chen,
Kelly A. Chipps,
Francesca Civano,
Luca Comisso,
Alejandro Cárdenas-Avendaño,
Phong Dang,
Catherine M. Deibel,
Tarraneh Eftekhari,
Courey Elliott,
Ryan J. Foley,
Christopher J. Fontes,
Amy Gall
, et al. (60 additional authors not shown)
Abstract:
Astrophysical observations of the cosmos allow us to probe extreme physics and answer foundational questions on our universe. Modern astronomy is increasingly operating under a holistic approach, probing the same question with multiple diagnostics including how sources vary over time, how they appear across the electromagnetic spectrum, and through their other signatures, including gravitational w…
▽ More
Astrophysical observations of the cosmos allow us to probe extreme physics and answer foundational questions on our universe. Modern astronomy is increasingly operating under a holistic approach, probing the same question with multiple diagnostics including how sources vary over time, how they appear across the electromagnetic spectrum, and through their other signatures, including gravitational waves, neutrinos, cosmic rays, and dust on Earth. Astrophysical observations are now reaching the point where approximate physics models are insufficient. Key sources of interest are explosive transients, whose understanding requires multidisciplinary studies at the intersection of astrophysics, gravity, nuclear science, plasma physics, fluid dynamics and turbulence, computation, particle physics, atomic, molecular, and optical science, condensed matter and materials science, radiation transport, and high energy density physics. This white paper provides an overview of the major scientific advances that lay at the intersection of physics and astronomy and are best probed through time-domain and multimessenger astrophysics, an exploration of how multidisciplinary science can be fostered, and introductory descriptions of the relevant scientific disciplines and key astrophysical sources of interest.
△ Less
Submitted 3 April, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Orbital correlations in bilayer nickelates: roles of doping and interlayer coupling
Authors:
Garima Goyal,
Aastha Jain,
Dheeraj Kumar Singh
Abstract:
We study the nature of orbital correlations present in the bilayer nickelate within a minimal two-orbital tight-binding model to gain insights into their possible role in stabilizing the less-known weakly-insulating state. The latter has been observed experimentally at ambient pressure. In order to achieve this objective, we examine the static orbital susceptibilities within the random-phase appro…
▽ More
We study the nature of orbital correlations present in the bilayer nickelate within a minimal two-orbital tight-binding model to gain insights into their possible role in stabilizing the less-known weakly-insulating state. The latter has been observed experimentally at ambient pressure. In order to achieve this objective, we examine the static orbital susceptibilities within the random-phase approximation. Our study highlights the sensitivity of orbital correlations to various factors including the interlayer coupling, carrier concentration, band-structure details such as the orbital contents, the number of bands contributing at the Fermi level etc. We relate this sensitiveness to the modification of the Fermi surfaces as well as their orbital contents dependent on aforementioned factors.
△ Less
Submitted 8 March, 2025; v1 submitted 2 February, 2025;
originally announced February 2025.
-
Role of Dirac cones in the anisotropic properties associated with the spin-density wave state of iron pnictides
Authors:
Garima Goyal,
Dheeraj Kumar Singh
Abstract:
The origin of unusual anisotropic electronic properties in the spin-density wave state of iron pnictides has conventionally been attributed to the breaking of four-fold rotational symmetry associated with the collinear magnetic order. By using a minimal two-orbital model, we show that a significant portion of the contribution to the anisotropy may come from the Dirac cones, which are not far away…
▽ More
The origin of unusual anisotropic electronic properties in the spin-density wave state of iron pnictides has conventionally been attributed to the breaking of four-fold rotational symmetry associated with the collinear magnetic order. By using a minimal two-orbital model, we show that a significant portion of the contribution to the anisotropy may come from the Dirac cones, which are not far away from the Fermi level. We demonstrate this phenomenon by examining optical conductivity and quasiparticle interference in the Dirac-semimetallic state with spin-density wave order, and the latter can be obtained by choosing appropriate interaction parameters and orbital splitting between the $d_{xz}$ and $d_{yz}$ orbitals. We further extend this study to investigate the low-energy spin-wave excitations in the Dirac-semimetallic state with spin-density wave order.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
On Round Surgery Diagrams For 3-Manifolds
Authors:
Prerak Deep,
Dheeraj Kulkarni
Abstract:
We introduce the notion of round surgery diagrams in $S^3$ for representing 3-manifolds similar to Dehn surgery diagrams. We give a correspondence between a certain class of round surgery diagrams and Dehn surgery diagrams for 3-manifolds. As a consequence, we recover Asimov's result, stating that any closed connected oriented 3-manifold can be obtained by a round surgery on a framed link in…
▽ More
We introduce the notion of round surgery diagrams in $S^3$ for representing 3-manifolds similar to Dehn surgery diagrams. We give a correspondence between a certain class of round surgery diagrams and Dehn surgery diagrams for 3-manifolds. As a consequence, we recover Asimov's result, stating that any closed connected oriented 3-manifold can be obtained by a round surgery on a framed link in $S^3$. There may be more than one round surgery diagram giving rise to the same 3-manifold. Thus, it is natural to ask whether there is a version of Kirby Calculus for round surgery diagrams, similar to the case of Dehn surgery diagrams with integral framings. In this direction, we define four types of moves on round surgery diagrams such that any two round surgery diagrams corresponding to the same 3-manifold can be obtained one from another by a finite sequence of these moves, thereby establishing a version of Kirby Calculus. As an application, we prove the existence of taut foliations, hence the existence of tight contact structures on 3-manifolds obtained by round 1-surgery on fibred links with two components on $S^3$.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Time-resolved Hubble Space Telescope UV observations of an X-ray quasi-periodic eruption source
Authors:
Thomas Wevers,
Muryel Guolo,
Sean Lockwood,
Andrew Mummery,
Dheeraj R. Pasham,
Riccardo Arcodia
Abstract:
X-ray quasi-periodic eruptions (QPEs) are a novel mode of variability in nearby galactic nuclei whose origin remains unknown. Their multi-wavelength properties are poorly constrained, as studies have focused almost entirely on the X-ray band. Here we report on time-resolved, coordinated Hubble Space Telescope far ultraviolet and XMM-Newton X-ray observations of the shortest period X-ray QPE source…
▽ More
X-ray quasi-periodic eruptions (QPEs) are a novel mode of variability in nearby galactic nuclei whose origin remains unknown. Their multi-wavelength properties are poorly constrained, as studies have focused almost entirely on the X-ray band. Here we report on time-resolved, coordinated Hubble Space Telescope far ultraviolet and XMM-Newton X-ray observations of the shortest period X-ray QPE source currently known, eRO-QPE2. We detect a bright UV point source ($L_{\rm FUV} \approx {\rm few} \times 10^{41}$ erg s$^{-1}$) that does not show statistically significant variability between the X-ray eruption and quiescent phases. This emission is unlikely to be powered by a young stellar population in a nuclear stellar cluster. The X-ray-to-UV spectral energy distribution can be described by a compact accretion disk ($R_{\rm out} = 343^{+202}_{-138} \ R_{\rm g}$). Such compact disks are incompatible with typical disks in active galactic nuclei, but form naturally following the tidal disruption of a star. Our results rule out models (for eRO-QPE2) invoking i) a classic AGN accretion disk and ii) no accretion disk at all. For orbiter models, the expected radius derived from the timing properties would naturally lead to disk-orbiter interactions for both quasi-spherical and eccentric trajectories. We infer a black hole mass of log$_{10}(M_{\rm BH}) = 5.9 \pm 0.3$ M$_{\odot}$ and Eddington ratio of 0.13$^{+0.18}_{-0.07}$; in combination with the compact outer radius this is inconsistent with existing disk instability models. After accounting for the quiescent disk emission, we constrain the ratio of X-ray to FUV luminosity of the eruption component to be $L_{\rm X} / L_{\rm FUV} > 16-85$ (depending on the intrinsic extinction).
△ Less
Submitted 23 January, 2025; v1 submitted 6 January, 2025;
originally announced January 2025.
-
Revisiting Point Cloud Completion: Are We Ready For The Real-World?
Authors:
Stuti Pathak,
Prashant Kumar,
Dheeraj Baiju,
Nicholus Mboga,
Gunther Steenackers,
Rudi Penne
Abstract:
Point clouds acquired in constrained, challenging, uncontrolled, and multi-sensor real-world settings are noisy, incomplete, and non-uniformly sparse. This presents acute challenges for the vital task of point cloud completion. Using tools from Algebraic Topology and Persistent Homology (PH), we demonstrate that current benchmark object point clouds lack rich topological features that are integral…
▽ More
Point clouds acquired in constrained, challenging, uncontrolled, and multi-sensor real-world settings are noisy, incomplete, and non-uniformly sparse. This presents acute challenges for the vital task of point cloud completion. Using tools from Algebraic Topology and Persistent Homology (PH), we demonstrate that current benchmark object point clouds lack rich topological features that are integral part of point clouds captured in realistic environments. To facilitate research in this direction, we contribute the first real-world industrial dataset for point cloud completion, RealPC - a diverse, rich and varied set of point clouds. It consists of ~ 40,000 pairs across 21 categories of industrial structures in railway establishments. Benchmark results on several strong baselines reveal that existing methods fail in real-world scenarios. We discover a striking observation - unlike current datasets, RealPC consists of multiple 0- and 1-dimensional PH-based topological features. We prove that integrating these topological priors into existing works helps improve completion. We present how 0-dimensional PH priors extract the global topology of a complete shape in the form of a 3D skeleton and assist a model in generating topologically consistent complete shapes. Since computing Homology is expensive, we present a simple, yet effective Homology Sampler guided network, BOSHNet that bypasses the Homology computation by sampling proxy backbones akin to 0-dim PH. These backbones provide similar benefits of 0-dim PH right from the start of the training, unlike similar methods where accurate backbones are obtained only during later phases of the training.
△ Less
Submitted 11 March, 2025; v1 submitted 26 November, 2024;
originally announced November 2024.
-
Perfect absorption of molecular vibration enabled by critical coupling in molecular metamaterial
Authors:
Govind Dayal,
Dheeraj Pratap
Abstract:
The absorption and emission spectrum arising from the vibrational motion of a molecule is mostly in the infrared region. These fingerprint absorptions of polar bonds enable us to acquire bond-specific chemical information from specimens. However, the mode mismatch between the atomic-scale dimensions of the chemical bonds and the resonance wavelength limits the direct detection of tiny amounts of s…
▽ More
The absorption and emission spectrum arising from the vibrational motion of a molecule is mostly in the infrared region. These fingerprint absorptions of polar bonds enable us to acquire bond-specific chemical information from specimens. However, the mode mismatch between the atomic-scale dimensions of the chemical bonds and the resonance wavelength limits the direct detection of tiny amounts of samples such as self-assembled monolayers or biological membranes. To overcome this limitation, surface-enhanced infrared absorption spectroscopy (SEIRA) has been proposed to enhance infrared absorption directly via local field enhancement. Here, we report on the perfect absorption of molecular vibration enabled by critical coupling in the metamaterials. Our molecular metamaterial design consists of a thin polymer layer sandwiched between a structured metal layer on top and a continuous metal layer at the bottom that supports the gap plasmon mode. The measured and simulated infrared spectra of the molecular metamaterial show broad and narrow absorption bands corresponding to the metamaterial and molecular vibration modes. We show that by tuning the structure's molecular film thickness and periodicity, vibrational absorption can be enhanced to near unity. We also show that for a particular periodicity of the array, metamaterial resonance can be completely suppressed, and only molecular vibrational absorption is excited, giving rise to an extremely narrow absorption band.
△ Less
Submitted 13 November, 2024;
originally announced November 2024.
-
Repeated Partial Tidal Disruptions and Quasi-Periodic Eruptions in SwJ023017.0+283603
Authors:
Dheeraj Pasham,
Eric Coughlin,
Chris Nixon,
Michal Zajacek,
Petra Sukova,
Vladimir Karas,
Thomas Wevers,
Francesco Tombesi
Abstract:
SwJ023017.0+283603 (SwJ0230) exhibited soft X-ray (0.3-1.0 keV) eruptions recurring roughly every 22 days. We present results from an extended monitoring campaign of SwJ0230 using Swift, NICER, and deep XMM-Newton observations. Our main findings are: 1) SwJ0230 did not display any eruptions during two 80-day periods (June-September 2023 and July-September 2024) of high-cadence monitoring with NICE…
▽ More
SwJ023017.0+283603 (SwJ0230) exhibited soft X-ray (0.3-1.0 keV) eruptions recurring roughly every 22 days. We present results from an extended monitoring campaign of SwJ0230 using Swift, NICER, and deep XMM-Newton observations. Our main findings are: 1) SwJ0230 did not display any eruptions during two 80-day periods (June-September 2023 and July-September 2024) of high-cadence monitoring with NICER and Swift, suggesting that the eruptions have ceased, implying an eruption lifetime of less than 536 days; 2) quiescent/non-eruption emission is detected with XMM-Newton, with a 0.3-2.0 keV luminosity of 4$\times$10$^{40}$ erg/s (bolometric luminosity of $<$0.1% Eddington assuming a black hole mass of 10$^{6-7}$ M$_{\odot}$), that is consistent with a thermal disk spectrum peaking at 0.11$^{+0.06}_{-0.03}$ keV; 3) SwJ0230 exhibited multiple, rapid eruptions (duration$<$5 hours, similar to quasi-periodic eruptions; QPEs), and there is tentative evidence that they recur, on average, on roughly the same timescale of 22 days. \target therefore exhibited (when active) both rapid, QPE-like outbursts and longer-duration outbursts, more akin to those from repeating partial Tidal Disruption Event (rpTDE) candidates. These findings are difficult to explain with existing models that invoke an orbiter interacting with a persistent disk and those involving disk instabilities. We propose a hybrid model wherein an object of smaller mass (e.g., a Jupiter-sized planet) being repeatedly partially stripped and subsequently punching through its own, fallback-induced disk, can explain many of the observed properties, including the long-duration flares (from accretion), the short-duration outbursts (from the planet-disk interaction), and the turn-off of the flares (when the planet is totally stripped of gas).
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Repeating transients in galactic nuclei: confronting observations with theory
Authors:
Petra Suková,
Francesco Tombesi,
Dheeraj R. Pasham,
Michal Zajaček,
Thomas Wevers,
Taeho Ryu,
Itai Linial,
Alessia Franchini
Abstract:
In the last few years, a mysterious new class of astrophysical objects has been uncovered. These are spatially coincident with the nuclei of external galaxies and show X-ray variations that repeat on timescales of minutes to a month. They manifest in three different ways in the data: stable quasi-periodic oscillations (QPOs), quasi-periodic eruptions (QPEs) and quasi-periodic outflows (QPOuts). QP…
▽ More
In the last few years, a mysterious new class of astrophysical objects has been uncovered. These are spatially coincident with the nuclei of external galaxies and show X-ray variations that repeat on timescales of minutes to a month. They manifest in three different ways in the data: stable quasi-periodic oscillations (QPOs), quasi-periodic eruptions (QPEs) and quasi-periodic outflows (QPOuts). QPOs are systems that show smooth recurrent X-ray brightness variations while QPEs are sudden changes that appear like eruptions. QPOuts represent systems that exhibit repeating outflows moving at mildly-relativistic velocities of about 0.1-0.3c, where c is the speed of light. Their underlying physical mechanism is a topic of heated debate, with most models proposing that they originate either from instabilities within the inner accretion flow or from orbiting objects. There is a huge excitement especially from the latter class of models as it has been argued that some repeating systems could host extreme mass-ratio inspirals, potentially detectable with upcoming space-based gravitational wave interferometers. Consequently, paving the path for an era of "persistent" multi-messenger astronomy. Here we summarize the recent findings on the topics, including the newest observational data, various physical models and their numerical implementation.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Alive and Strongly Kicking: Stable X-ray Quasi-Periodic Eruptions from eRO-QPE2 over 3.5 Years
Authors:
Dheeraj Pasham,
Shubham Kejriwal,
Eric Coughlin,
Vojtěch Witzany,
Alvin J. K. Chua,
Michal Zajaček,
Thomas Wevers,
Yukta Ajay
Abstract:
Quasi-periodic eruptions (QPEs) are recurring bursts of soft X-rays from the nuclei of galaxies. Their physical origin is currently a subject of debate, with models typically invoking an orbiter around a massive black hole or disk instabilities. Here we present and analyze the temporal and spectral evolution of the QPE source eRO-QPE2 over 3.5 years. We find that eRO-QPE2 1) is remarkably stable o…
▽ More
Quasi-periodic eruptions (QPEs) are recurring bursts of soft X-rays from the nuclei of galaxies. Their physical origin is currently a subject of debate, with models typically invoking an orbiter around a massive black hole or disk instabilities. Here we present and analyze the temporal and spectral evolution of the QPE source eRO-QPE2 over 3.5 years. We find that eRO-QPE2 1) is remarkably stable over the entire 3.5-year temporal baseline in its eruption peak luminosity, eruption temperature, quiescent temperature, and quiescent luminosity, 2) has a stable mean eruption recurrence time of 2.35 hours, with marginal ($\sim$2$σ$) evidence for a $0.1$ hour reduction over the 3.5 yr period, and 3) has a long-short variation in its recurrence time in August 2020, but this pattern is absent from all subsequent observations. The stability of its peak eruption luminosity and that of the quiescent state are notably dissimilar from three previously tracked QPEs (GSN069, eRO-QPE1, eRO-QPE3), which show declines in eruption and quiescent flux over comparable temporal baselines. This stability is even more pronounced in eRO-QPE2 due to its 2.4 hour average recurrence time compared to GSN-069's 9 hour, eRO-QPE1's 16 hour, and eRO-QPE3's 20 hour recurrence times, i.e., this system has undergone 4-8 times more cycles than these other systems over the 3.5 years of observations. We discuss the implications of these observations within the context of some proposed extreme mass ratio inspiral (EMRI) models.
△ Less
Submitted 31 October, 2024;
originally announced November 2024.
-
Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD
Authors:
Aniket Das,
Dheeraj Nagaraj,
Soumyabrata Pal,
Arun Suggala,
Prateek Varshney
Abstract:
We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever th…
▽ More
We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. More precisely, with $T$ samples, we show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $\sqrt{\frac{\mathsf{Tr}(Σ)+\sqrt{\mathsf{Tr}(Σ)\|Σ\|_2}\log(\frac{\log(T)}δ)}{T}}$ with probability $1-δ$, where $Σ$ is the covariance of the clipped gradient. Note that the fluctuations (depending on $\frac{1}δ$) are of lower order than the term $\mathsf{Tr}(Σ)$. This improves upon the current best rate of $\sqrt{\frac{\mathsf{Tr}(Σ)\log(\frac{1}δ)}{T}}$ for Clipped-SGD, known only for smooth and strongly convex objectives. Our results also extend to smooth convex and lipschitz convex objectives. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of Catoni and Giulini.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Authors:
Tyler A. Chang,
Dheeraj Rajagopal,
Tolga Bolukbasi,
Lucas Dixon,
Ian Tenney
Abstract:
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples, and the application of these methods to large language model (LLM) outputs could significantly advance model transparency and data curation. However, it has been challenging to date to apply these methods to the full scale of LLM pretraining. In this paper, we refine existing gradient-based m…
▽ More
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples, and the application of these methods to large language model (LLM) outputs could significantly advance model transparency and data curation. However, it has been challenging to date to apply these methods to the full scale of LLM pretraining. In this paper, we refine existing gradient-based methods to work effectively at scale, allowing us to retrieve influential examples for an 8B-parameter language model from a pretraining corpus of over 160B tokens with no need for subsampling or pre-filtering. Our method combines several techniques, including optimizer state correction, a task-specific Hessian approximation, and normalized encodings, which we find to be critical for performance at scale. In quantitative evaluations on a fact tracing task, our method performs best at identifying examples that influence model predictions, but classical, model-agnostic retrieval methods such as BM25 still perform better at finding passages which explicitly contain relevant facts. These results demonstrate a misalignment between factual *attribution* and causal *influence*. With increasing model size and training tokens, we find that influence more closely aligns with factual attribution. Finally, we examine different types of examples identified as influential by our method, finding that while many directly entail a particular fact, others support the same output by reinforcing priors on relation types, common entities, and names. We release our prompt set and model outputs, along with a web-based visualization tool to explore influential examples for factual predictions, commonsense reasoning, arithmetic, and open-ended generation for an 8B-parameter LLM.
△ Less
Submitted 20 December, 2024; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Revealing EMRI/IMRI candidates with quasiperiodic ultrafast outflows
Authors:
Michal Zajaček,
Petra Suková,
Vladimír Karas,
Dheeraj R. Pasham,
Francesco Tombesi,
Petr Kurfürst,
Henry Best,
Izzy Garland,
Matúš Labaj,
Monika Pikhartová
Abstract:
The first detection of the quasiperiodic ultrafast outflow in the ASASSN-20qc system was reported by Pasham et al. (2024). The outflow is revealed in the soft X-ray spectra as an absorption feature, which is enhanced periodically every $\sim 8.3$ days. The repetitive nature of the ultrafast outflow is tentatively explained by an orbiting massive perturber, possibly an intermediate-mass black hole…
▽ More
The first detection of the quasiperiodic ultrafast outflow in the ASASSN-20qc system was reported by Pasham et al. (2024). The outflow is revealed in the soft X-ray spectra as an absorption feature, which is enhanced periodically every $\sim 8.3$ days. The repetitive nature of the ultrafast outflow is tentatively explained by an orbiting massive perturber, possibly an intermediate-mass black hole (IMBH), trajectory of which is inclined with respect to the accretion flow around the primary supermassive black hole (SMBH). In this scenario, the orbiting body pushes the disc gas into the outflow funnel, where it is accelerated by the ordered magnetic field (Suková et al. 2021). Quasiperiodic ultrafast outflows (a.k.a. QPOuts) are thus a novel phenomenon that can help reveal new extreme-/intermediate-mass ratio inspiral (EMRI/IMRI) candidates. These then would be prime candidate sources for a simultaneous detection and monitoring in electromagnetic as well as gravitational wave domains.
△ Less
Submitted 3 April, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
The STROBE-X Wide Field Monitor Instrument
Authors:
Ronald A. Remillard,
Margarita Hernanz,
Jean in 't Zand,
Paul S. Ray,
Valter Bonvicini,
Søren Brandt,
Terri Brandt,
Alex Carmona,
Yuri Evangelista,
Daniel Alvarez Franco,
Cynthia Froning,
Jose-Luis Galvez,
Gianluigi De Geronimo,
Martin Grim,
Emrah Kalemci,
Lucien Kuiper,
Irfan Kuvvetli,
Thomas J. Maccarone,
Witold Nowosielski,
Dheeraj R. R. Pasham,
Alessandro Patruno,
Steven C. Persyn,
Peter W. A. Roming,
Andrea Santangelo,
Stephane Schanne
, et al. (4 additional authors not shown)
Abstract:
The Wide Field Monitor (WFM) is one of the three instruments on the Spectroscopic Time-Resolving Observatory for Broadband Energy X-rays (STROBE-X) mission, which was proposed in response to the NASA 2023 call for a probe class mission. The WFM is a coded-mask camera system that would be the most scientifically capable wide-angle monitor ever flown. The field of view covers one third of the sky, t…
▽ More
The Wide Field Monitor (WFM) is one of the three instruments on the Spectroscopic Time-Resolving Observatory for Broadband Energy X-rays (STROBE-X) mission, which was proposed in response to the NASA 2023 call for a probe class mission. The WFM is a coded-mask camera system that would be the most scientifically capable wide-angle monitor ever flown. The field of view covers one third of the sky, to 50 percent mask coding, and the energy sensitivity is 2 to 50 keV. The WFM is designed to identify new X-ray transients and to capture spectral and timing changes in known sources with data of unprecedented quality. Science applications cover diverse classes, in including X-ray bursts that coincide with gravitational wave detections, gamma ray bursts and their transition from prompt emission to afterglow, subluminous GRBs that may signal shock breakout in supernovae, state transitions in accreting compact objects and their jets, bright flares in fast X-ray transients, accretion onset in transitional pulsars, and coronal flares from many types of active stars.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Model Predictive Control is Almost Optimal for Restless Bandit
Authors:
Nicolas Gast,
Dheeraj Narasimha
Abstract:
We consider the discrete time infinite horizon average reward restless markovian bandit (RMAB) problem. We propose a \emph{model predictive control} based non-stationary policy with a rolling computational horizon $τ$. At each time-slot, this policy solves a $τ$ horizon linear program whose first control value is kept as a control for the RMAB. Our solution requires minimal assumptions and quantif…
▽ More
We consider the discrete time infinite horizon average reward restless markovian bandit (RMAB) problem. We propose a \emph{model predictive control} based non-stationary policy with a rolling computational horizon $τ$. At each time-slot, this policy solves a $τ$ horizon linear program whose first control value is kept as a control for the RMAB. Our solution requires minimal assumptions and quantifies the loss in optimality in terms of $τ$ and the number of arms, $N$. We show that its sub-optimality gap is $O(1/\sqrt{N})$ in general, and $\exp(-Ω(N))$ under a local-stability condition. Our proof is based on a framework from dynamic control known as \emph{dissipativity}. Our solution easy to implement and performs very well in practice when compared to the state of the art. Further, both our solution and our proof methodology can easily be generalized to more general constrained MDP settings and should thus, be of great interest to the burgeoning RMAB community.
△ Less
Submitted 5 June, 2025; v1 submitted 8 October, 2024;
originally announced October 2024.
-
New developments on the Ingot WFS laboratory testing
Authors:
Tânia Gomes Machado,
Simone Di Filippo,
Kalyan K. R. Santhakumari,
Maria Bergomi,
Davide Greggio,
Elisa Portaluri,
Dheeraj Malik,
César Nesme,
Carmelo Arcidiacono,
Alessandro Ballone,
Federico Battaini,
Valentina Viotto,
Roberto Ragazzoni,
Marco Dima,
Luca Marafatto,
Jacopo Farinato,
Demetrio Magrin,
Luigi Lessio,
Gabriele Umbriaco
Abstract:
The Ingot WFS was designed to overcome some of the challenges present in classical wavefront sensors when they deal with sodium LGSs. This innovative sensor works by sensing the full 3D volume of the elongated LGS and is suitable for use in very large telescopes. A test bench has been assembled at the INAF - Osservatorio Astronomico di Padova laboratories to test and characterize the functioning o…
▽ More
The Ingot WFS was designed to overcome some of the challenges present in classical wavefront sensors when they deal with sodium LGSs. This innovative sensor works by sensing the full 3D volume of the elongated LGS and is suitable for use in very large telescopes. A test bench has been assembled at the INAF - Osservatorio Astronomico di Padova laboratories to test and characterize the functioning of the Ingot WFS. In this work, we summarize the main results of the tests performed on a new search algorithm. Then, we move towards a more accurate simulation of the sodium LGS by replicating real time-varying sodium layer profiles. The study of their impact on the ingot pupil signals is described in this work.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
A long-duration superflare on the K giant HD 251108
Authors:
Hans Moritz Günther,
Dheeraj Pasham,
Alexander Binks,
Stefan Czesla,
Teruaki Enoto,
Michael Fausnaugh,
Franz-Josef Hambsch,
Shun Inoue,
Hiroyuki Maehara,
Yuta Notsu,
Jan Robrade,
J. H. M. M. Schmitt,
P. C. Schneider
Abstract:
Many giant stars are magnetically active, which causes rotational variability, chromospheric emission lines, and X-ray emission. Large outbursts in these emission features can set limits on the magnetic field strength and thus constrain the mechanism of the underlying dynamo. HD~251108 is a Li-rich active K-type giant. We find a rotational period of 21.3~d with color changes and additional long-te…
▽ More
Many giant stars are magnetically active, which causes rotational variability, chromospheric emission lines, and X-ray emission. Large outbursts in these emission features can set limits on the magnetic field strength and thus constrain the mechanism of the underlying dynamo. HD~251108 is a Li-rich active K-type giant. We find a rotational period of 21.3~d with color changes and additional long-term photometric variability. Both can be explained with very stable stellar spots. We followed the decay phase of a superflare for 28 days with NICER and from the ground. We track the flare decay in unprecedented detail in several coronal temperature components. With a peak flux around $10^{34}$~erg~s$^{-1}$ (0.5-4.0~keV) and an exponential decay time of 2.2~days in the early decay phase, this is one of the strongest flares ever observed; yet it follows trends established from samples of smaller flares, for example for the relations between H$α$ and X-ray flux, indicating that the physical process that powers the flare emission is consistent over a large range of flare energies. We estimate a flare loop length about 2-4 times the stellar radius. No evidence is seen for abundance changes during the flare.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Self-organization and memory in an disordered solid subject to random loading
Authors:
Muhittin Mungan,
Dheeraj Kumar,
Sylvain Patinet,
Damien Vandembroucq
Abstract:
We consider self-organization and memory formation in a mesoscopic model of an amorphous solid subject to a random shear strain protocol confined to a strain range $\pm \varepsilon_{\rm max}$. We develop proper read-out protocols to show that the response of the driven system retains a memory of the strain range, which can be subsequently retrieved. Our findings generalize previous results obtaine…
▽ More
We consider self-organization and memory formation in a mesoscopic model of an amorphous solid subject to a random shear strain protocol confined to a strain range $\pm \varepsilon_{\rm max}$. We develop proper read-out protocols to show that the response of the driven system retains a memory of the strain range, which can be subsequently retrieved. Our findings generalize previous results obtained upon oscillatory driving and suggest that self-organization and memory formation of disordered materials can emerge under more general conditions, such as a disordered system interacting with its fluctuating environment. The self-organization results in a correlation between the dynamics of the system and its environment. We conclude by discussing our results within the context of environmental sensing, highlighting their generalizability to adaptation strategies of simple organisms under changing conditions.
△ Less
Submitted 22 January, 2025; v1 submitted 25 September, 2024;
originally announced September 2024.
-
Possible pairing states in the superconducting bilayer nickelate
Authors:
Dheeraj Kumar Singh,
Garima Goyal,
Yunkyu Bang
Abstract:
We examine various possibilities for the pairing mechanisms in the recently discovered bilayer-nickelate superconductor within the Bardeen-Cooper-Schrieffer framework. Unlike earlier studies, where only a pure $d$-wave or sign-changing $s$-wave superconductivity instability was investigated, our study explores the possibilities of mixed-state superconducting instability such as the one involving b…
▽ More
We examine various possibilities for the pairing mechanisms in the recently discovered bilayer-nickelate superconductor within the Bardeen-Cooper-Schrieffer framework. Unlike earlier studies, where only a pure $d$-wave or sign-changing $s$-wave superconductivity instability was investigated, our study explores the possibilities of mixed-state superconducting instability such as the one involving both $d$- and sign-changing $s$-waves. While assuming that the superconductivity arises because of the magnetic correlations, we examine the nature of the superconducting gap function associated density of states with various possible magnetic correlation wavevectors arising out as a result of multiple pockets owing to the multiple orbitals and bilayer splitting. We also explore the effect of differences in the nature of Fermi surfaces suggested by various studies.
△ Less
Submitted 4 October, 2024; v1 submitted 14 September, 2024;
originally announced September 2024.
-
Self-organization and memory in a cyclically driven elasto-plastic model of an amorphous solid
Authors:
Dheeraj Kumar,
Muhittin Mungan,
Sylvain Patinet,
Damien Vandembroucq
Abstract:
The mechanical behavior of disordered materials such as dense suspensions, glasses or granular materials depends on their thermal and mechanical past. Here we report the memory behavior of a quenched mesoscopic elasto-plastic (QMEP) model. After prior oscillatory training, a simple read-out protocol gives access to both the training protocol's amplitude and the last shear direction. The memory of…
▽ More
The mechanical behavior of disordered materials such as dense suspensions, glasses or granular materials depends on their thermal and mechanical past. Here we report the memory behavior of a quenched mesoscopic elasto-plastic (QMEP) model. After prior oscillatory training, a simple read-out protocol gives access to both the training protocol's amplitude and the last shear direction. The memory of direction emerges from the development of a mechanical polarization during training. The analysis of sample-to-sample fluctuations gives direct access to the irreversibility transition. Despite the quadrupolar nature of the elastic interactions in amorphous solids, a behavior close to Return Point Memory (RPM) is observed. The quasi RPM property is used to build a simple Preisach-like model of directional memory.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Probing 4 X 4 quark mixing matrix
Authors:
Gurjit Kaur,
Gulsheen Ahuja,
Dheeraj Shukla,
Manmohan Gupta
Abstract:
Without adhering to any specific model, we have presented 4 X 4 quark mixing matrix as an extension of the 3 X 3 PDG parametrization of the CKM matrix. Using unitarity constraints as well as the hierarchy among the elements of the 3 X 3 CKM matrix, we have found the hierarchy among the 4th row and 4th column elements of the 4 X 4 quark mixing matrix. Further, for the fourth generation case, we hav…
▽ More
Without adhering to any specific model, we have presented 4 X 4 quark mixing matrix as an extension of the 3 X 3 PDG parametrization of the CKM matrix. Using unitarity constraints as well as the hierarchy among the elements of the 3 X 3 CKM matrix, we have found the hierarchy among the 4th row and 4th column elements of the 4 X 4 quark mixing matrix. Further, for the fourth generation case, we have explicitly found the 9 independent rephasing invariant parameters J_4X4. Also, using phenomenological estimates of the 4th row and 4th column elements, we have numerically evaluated these 9 parameters.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Online Matrix Completion: A Collaborative Approach with Hott Items
Authors:
Dheeraj Baby,
Soumyabrata Pal
Abstract:
We investigate the low rank matrix completion problem in an online setting with ${M}$ users, ${N}$ items, ${T}$ rounds, and an unknown rank-$r$ reward matrix ${R}\in \mathbb{R}^{{M}\times {N}}$. This problem has been well-studied in the literature and has several applications in practice. In each round, we recommend ${S}$ carefully chosen distinct items to every user and observe noisy rewards. In…
▽ More
We investigate the low rank matrix completion problem in an online setting with ${M}$ users, ${N}$ items, ${T}$ rounds, and an unknown rank-$r$ reward matrix ${R}\in \mathbb{R}^{{M}\times {N}}$. This problem has been well-studied in the literature and has several applications in practice. In each round, we recommend ${S}$ carefully chosen distinct items to every user and observe noisy rewards. In the regime where ${M},{N} >> {T}$, we propose two distinct computationally efficient algorithms for recommending items to users and analyze them under the benign \emph{hott items} assumption.1) First, for ${S}=1$, under additional incoherence/smoothness assumptions on ${R}$, we propose the phased algorithm \textsc{PhasedClusterElim}. Our algorithm obtains a near-optimal per-user regret of $\tilde{O}({N}{M}^{-1}(Δ^{-1}+Δ_{hott}^{-2}))$ where $Δ_{hott},Δ$ are problem-dependent gap parameters with $Δ_{hott} >> Δ$ almost always. 2) Second, we consider a simplified setting with ${S}=r$ where we make significantly milder assumptions on ${R}$. Here, we introduce another phased algorithm, \textsc{DeterminantElim}, to derive a regret guarantee of $\widetilde{O}({N}{M}^{-1/r}Δ_{det}^{-1}))$ where $Δ_{det}$ is another problem-dependent gap. Both algorithms crucially use collaboration among users to jointly eliminate sub-optimal items for groups of users successively in phases, but with distinctive and novel approaches.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
The Bandit Whisperer: Communication Learning for Restless Bandits
Authors:
Yunfan Zhao,
Tonghan Wang,
Dheeraj Nagaraj,
Aparna Taneja,
Milind Tambe
Abstract:
Applying Reinforcement Learning (RL) to Restless Multi-Arm Bandits (RMABs) offers a promising avenue for addressing allocation problems with resource constraints and temporal dynamics. However, classic RMAB models largely overlook the challenges of (systematic) data errors - a common occurrence in real-world scenarios due to factors like varying data collection protocols and intentional noise for…
▽ More
Applying Reinforcement Learning (RL) to Restless Multi-Arm Bandits (RMABs) offers a promising avenue for addressing allocation problems with resource constraints and temporal dynamics. However, classic RMAB models largely overlook the challenges of (systematic) data errors - a common occurrence in real-world scenarios due to factors like varying data collection protocols and intentional noise for differential privacy. We demonstrate that conventional RL algorithms used to train RMABs can struggle to perform well in such settings. To solve this problem, we propose the first communication learning approach in RMABs, where we study which arms, when involved in communication, are most effective in mitigating the influence of such systematic data errors. In our setup, the arms receive Q-function parameters from similar arms as messages to guide behavioral policies, steering Q-function updates. We learn communication strategies by considering the joint utility of messages across all pairs of arms and using a Q-network architecture that decomposes the joint utility. Both theoretical and empirical evidence validate the effectiveness of our method in significantly improving RMAB performance across diverse problems.
△ Less
Submitted 19 March, 2025; v1 submitted 10 August, 2024;
originally announced August 2024.
-
Hierarchy of CKM matrix elements and implications of unitarity
Authors:
Gurjit Kaur,
Gulsheen Ahuja,
Dheeraj Shukla,
Manmohan Gupta
Abstract:
The hierarchy amongst the CKM matrix elements, highlighted recently by Luo and Xing, has been rigorously revisited using the PDG parameterization incorporating unitarity constraints. Further, we have explored the evaluation of the CP violating parameter ε_k for the 9 possible unitarity ensured equivalent independent parametrizations of CKM matrix. Interestingly, we find that not all of these repro…
▽ More
The hierarchy amongst the CKM matrix elements, highlighted recently by Luo and Xing, has been rigorously revisited using the PDG parameterization incorporating unitarity constraints. Further, we have explored the evaluation of the CP violating parameter ε_k for the 9 possible unitarity ensured equivalent independent parametrizations of CKM matrix. Interestingly, we find that not all of these reproduce the value of parameter ε_k, this being presumably due to the hierarchical nature of the CKM matrix elements. This situation is echoing similar conclusions regarding the evaluation of Jarlskog's rephasing invariant parameter J through unitarity ensured equivalent possibilities.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
CONGO: Compressive Online Gradient Optimization
Authors:
Jeremy Carleton,
Prathik Vijaykumar,
Divyanshu Saxena,
Dheeraj Narasimha,
Srinivas Shakkottai,
Aditya Akella
Abstract:
We address the challenge of zeroth-order online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivati…
▽ More
We address the challenge of zeroth-order online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivation stems from the optimization of large-scale queueing networks that process time-sensitive jobs. Here, a job must be processed by potentially many queues in sequence to produce an output, and the service time at any queue is a function of the resources allocated to that queue. Since resources are costly, the end-to-end latency for jobs must be balanced with the overall cost of the resources used. While the number of queues is substantial, the latency function primarily reacts to resource changes in only a few, rendering the gradient sparse. We tackle this problem by introducing the Compressive Online Gradient Optimization framework which allows compressive sensing methods previously applied to stochastic optimization to achieve regret bounds with an optimal dependence on the time horizon without the full problem dimension appearing in the bound. For specific algorithms, we reduce the samples required per gradient estimate to scale with the gradient's sparsity factor rather than its full dimensionality. Numerical simulations and real-world microservices benchmarks demonstrate CONGO's superiority over gradient descent approaches that do not account for sparsity.
△ Less
Submitted 16 May, 2025; v1 submitted 8 July, 2024;
originally announced July 2024.
-
When is the consistent prediction likely to be a correct prediction?
Authors:
Alex Nguyen,
Dheeraj Mekala,
Chengyu Dong,
Jingbo Shang
Abstract:
Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all o…
▽ More
Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all outputs, are more likely to be correct. This is predominantly because we demonstrate that LLMs can autonomously produce chain-of-thought (CoT) style reasoning with no custom prompts merely while generating longer responses, which lead to consistent predictions that are more accurate. In the zero-shot setting, by sampling Mixtral-8x7B model multiple times and considering longer responses, we achieve 86% of its self-consistency performance obtained through zero-shot CoT prompting on the GSM8K and MultiArith datasets. Finally, we demonstrate that the probability of LLMs generating a longer response is quite low, highlighting the need for decoding strategies conditioned on output length.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Cosmological constraints in symmetric teleparallel gravity with bulk viscosity
Authors:
Dheeraj Singh Rana,
P. K. Sahoo
Abstract:
In this study, we explore the accelerated expansion of the universe within the framework of modified $f(Q)$ gravity. The investigation focus on the role of bulk viscosity in understanding the universe's accelerated expansion. Specifically, a bulk viscous matter-dominated cosmological model is considered, with the bulk viscosity coefficient expressed as $ζ= ζ_0 ρH^{-1} + ζ_1 H $. We consider the po…
▽ More
In this study, we explore the accelerated expansion of the universe within the framework of modified $f(Q)$ gravity. The investigation focus on the role of bulk viscosity in understanding the universe's accelerated expansion. Specifically, a bulk viscous matter-dominated cosmological model is considered, with the bulk viscosity coefficient expressed as $ζ= ζ_0 ρH^{-1} + ζ_1 H $. We consider the power law $f(Q)$ function $f(Q)=αQ^n $, where $α$ and $n$ are arbitrary constants and derive the analytical solutions for the field equations corresponding to a flat FLRW metric. Subsequently, we used the combined Cosmic Chronometers (CC)+Pantheon+SH0ES sample to estimate the free parameters of the obtained analytic solution. We conduct Bayesian statistical analysis to estimate the posterior probability by employing the likelihood function and the MCMC random sampling technique, along with the AIC and BIC statistical assessment criteria. In addition, we explore the evolutionary behavior of significant cosmological parameters. The effective equation of state (EOS) parameter predicts the accelerating behavior of the cosmic expansion phase. Further, by the statefinder and $Om(z)$ diagnostic test, we found that our viscous model favors quintessence-type behavior and can successfully describe the late-time scenario.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
On A Potential Contact Analogue Of Kirby Move Of Type 1
Authors:
Prerak Deep,
Dheeraj Kulkarni
Abstract:
In this expository note, we explore the possibility of the existence of Kirby move of type 1 for contact surgery diagrams. In particular, we give the necessary conditions on a contact surgery diagram to become a potential candidate for contact Kirby move of type 1. We observe that there is a collection of contact positive integral surgery diagrams on Legendrian unknots satisfying those conditions.
In this expository note, we explore the possibility of the existence of Kirby move of type 1 for contact surgery diagrams. In particular, we give the necessary conditions on a contact surgery diagram to become a potential candidate for contact Kirby move of type 1. We observe that there is a collection of contact positive integral surgery diagrams on Legendrian unknots satisfying those conditions.
△ Less
Submitted 21 October, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Authors:
Benno Krojer,
Dheeraj Vattikonda,
Luis Lara,
Varun Jampani,
Eva Portelance,
Christopher Pal,
Siva Reddy
Abstract:
An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static dat…
▽ More
An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt, target image) contain a single meaningful visual change described by the prompt, i.e., truly minimal changes between source and target images. To demonstrate the value of our dataset, we evaluate an AURORA-finetuned model on a new expert-curated benchmark (AURORA-Bench) covering 8 diverse editing tasks. Our model significantly outperforms previous editing models as judged by human raters. For automatic evaluations, we find important flaws in previous metrics and caution their use for semantically hard editing tasks. Instead, we propose a new automatic metric that focuses on discriminative understanding. We hope that our efforts : (1) curating a quality training dataset and an evaluation benchmark, (2) developing critical evaluations, and (3) releasing a state-of-the-art model, will fuel further progress on general image editing.
△ Less
Submitted 17 October, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Authors:
Ibrahim Abdelaziz,
Kinjal Basu,
Mayank Agarwal,
Sadhana Kumaravel,
Matthew Stallone,
Rameswar Panda,
Yara Rizk,
GP Bhargav,
Maxwell Crouse,
Chulaka Gunasekara,
Shajith Ikbal,
Sachin Joshi,
Hima Karanam,
Vineet Kumar,
Asim Munawar,
Sumit Neelam,
Dinesh Raghu,
Udit Sharma,
Adriana Meza Soria,
Dheeraj Sreedhar,
Praveen Venkateswaran,
Merve Unuvar,
David Cox,
Salim Roukos,
Luis Lastras
, et al. (1 additional authors not shown)
Abstract:
Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (AP…
▽ More
Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (APIs) to complete complex tasks. These tasks together are termed function calling. Endowing LLMs with function calling abilities leads to a myriad of advantages, such as access to current and domain-specific information in databases and knowledge sources, and the ability to outsource tasks that can be reliably performed by tools, e.g., a Python interpreter or calculator. While there has been significant progress in function calling with LLMs, there is still a dearth of open models that perform on par with proprietary LLMs like GPT, Claude, and Gemini. Therefore, in this work, we introduce the GRANITE-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks encompassed in function calling, those being Nested Function Calling, Function Chaining, Parallel Functions, Function Name Detection, Parameter-Value Pair Detection, Next-Best Function, and Response Generation. We present a comprehensive evaluation on multiple out-of-domain datasets comparing GRANITE-20B-FUNCTIONCALLING to more than 15 other best proprietary and open models. GRANITE-20B-FUNCTIONCALLING provides the best performance among all open models on the Berkeley Function Calling Leaderboard and fourth overall. As a result of the diverse tasks and datasets used for training our model, we show that GRANITE-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.