-
Efficient Ionizers with Low H$\boldsymbolβ$+[OIII] Equivalent Widths: JADES Spectroscopy of a Peculiar High-z Population
Authors:
Isaac H. Laseter,
Michael V. Maseda,
Charlotte Simmonds,
Ryan Endsley,
Daniel Stark,
Andrew J. Bunker,
Rachana Bhatawdekar,
Kristan Boyett,
Alex J. Cameron,
Stefano Carniani,
Mirko Curti,
Zhiyuan Ji,
Pierluigi Rinaldi,
Aayush Saxena,
Sandro Tacchella,
Chris Willott,
Joris Witstok,
Yongda Zhu
Abstract:
Early JWST photometric studies discovered a population of UV faint ($\rm <L^{*}_{UV}$) $z \sim 6.5-8$ Lyman break galaxies with spectral energy distributions implying young ages ($\sim10$ Myr) yet relatively weak H$β$+[OIII] equivalent widths ($\rm EW_{Hβ+[OIII]} \approx 400$Å). These galaxies seemingly contradict the implicit understanding that young star-forming galaxies are ubiquitously strong…
▽ More
Early JWST photometric studies discovered a population of UV faint ($\rm <L^{*}_{UV}$) $z \sim 6.5-8$ Lyman break galaxies with spectral energy distributions implying young ages ($\sim10$ Myr) yet relatively weak H$β$+[OIII] equivalent widths ($\rm EW_{Hβ+[OIII]} \approx 400$Å). These galaxies seemingly contradict the implicit understanding that young star-forming galaxies are ubiquitously strong H$β$+[OIII] emitters, i.e., extreme emission line galaxies (EW $\rm \gtrsim 750$Å). Low metallicities, high Lyman continuum escape fractions, and rapidly declining star-formation histories have been proposed as primary drivers behind low H$β$+[OIII] equivalent widths, but the blend of H$β$+[OIII] in photometric studies makes proving one of these scenarios difficult. We aim to characterize this peculiar population with deep spectroscopy from the JWST Advanced Deep Extragalactic Survey (JADES). We find that a significant subset of these galaxies at $z\gtrsim2$ with modest H$β$+[OIII] equivalent widths ($\rm \approx 300-600$Å) have high ionization efficiencies ($\rm \log ξ_{ion} \gtrsim 25.5~[Hz~erg^{-1}]$). Suppressed [OIII] EW values yet elevated H$α$ and H$β$ EW values imply that the level of chemical enrichment is the primary culprit, supported by spectroscopic measurements of metallicities below 12+log(O/H)$\rm \approx 7.70~(10\%Z_{\odot})$. We demonstrate that integrated H$β$+[OIII] selections (e.g., H$β$+[OIII] EW $> 700$Å) exclude the most metal-poor efficient ionizers and favor 1) more chemically enriched systems with comparable extreme radiation fields and 2) older starbursting systems. In contrast, metallicity degeneracies are reduced in H$α$ space, enabling the identification of these metal-poor efficient ionizers by their specific star-formation rate.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Multi-view Image Diffusion via Coordinate Noise and Fourier Attention
Authors:
Justin Theiss,
Norman Müller,
Daeil Kim,
Aayush Prakash
Abstract:
Recently, text-to-image generation with diffusion models has made significant advancements in both higher fidelity and generalization capabilities compared to previous baselines. However, generating holistic multi-view consistent images from prompts still remains an important and challenging task. To address this challenge, we propose a diffusion process that attends to time-dependent spatial freq…
▽ More
Recently, text-to-image generation with diffusion models has made significant advancements in both higher fidelity and generalization capabilities compared to previous baselines. However, generating holistic multi-view consistent images from prompts still remains an important and challenging task. To address this challenge, we propose a diffusion process that attends to time-dependent spatial frequencies of features with a novel attention mechanism as well as novel noise initialization technique and cross-attention loss. This Fourier-based attention block focuses on features from non-overlapping regions of the generated scene in order to better align the global appearance. Our noise initialization technique incorporates shared noise and low spatial frequency information derived from pixel coordinates and depth maps to induce noise correlations across views. The cross-attention loss further aligns features sharing the same prompt across the scene. Our technique improves SOTA on several quantitative metrics with qualitatively better results when compared to other state-of-the-art approaches for multi-view consistency.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Moduli of real (res. quaternionic) L-connections
Authors:
Ayush Jaiswal
Abstract:
We have studied irreducible real (respectively, quaternionic) Lie algebroid connections and prove that the Gauge theoretic moduli space has Hausdorff Hilbert manifold structure. This work generalises some known results about simple semi-connections for complex vector bundle on compact complex manifold in real algebraic geometry.
We have studied irreducible real (respectively, quaternionic) Lie algebroid connections and prove that the Gauge theoretic moduli space has Hausdorff Hilbert manifold structure. This work generalises some known results about simple semi-connections for complex vector bundle on compact complex manifold in real algebraic geometry.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
MeasureNet: Measurement Based Celiac Disease Identification
Authors:
Aayush Kumar Tyagi,
Vaibhav Mishra,
Ashok Tiwari,
Lalita Mehra,
Prasenjit Das,
Govind Makharia,
Prathosh AP,
Mausam
Abstract:
Celiac disease is an autoimmune disorder triggered by the consumption of gluten. It causes damage to the villi, the finger-like projections in the small intestine that are responsible for nutrient absorption. Additionally, the crypts, which form the base of the villi, are also affected, impairing the regenerative process. The deterioration in villi length, computed as the villi-to-crypt length rat…
▽ More
Celiac disease is an autoimmune disorder triggered by the consumption of gluten. It causes damage to the villi, the finger-like projections in the small intestine that are responsible for nutrient absorption. Additionally, the crypts, which form the base of the villi, are also affected, impairing the regenerative process. The deterioration in villi length, computed as the villi-to-crypt length ratio, indicates the severity of celiac disease. However, manual measurement of villi-crypt length can be both time-consuming and susceptible to inter-observer variability, leading to inconsistencies in diagnosis. While some methods can perform measurement as a post-hoc process, they are prone to errors in the initial stages. This gap underscores the need for pathologically driven solutions that enhance measurement accuracy and reduce human error in celiac disease assessments.
Our proposed method, MeasureNet, is a pathologically driven polyline detection framework incorporating polyline localization and object-driven losses specifically designed for measurement tasks. Furthermore, we leverage segmentation model to provide auxiliary guidance about crypt location when crypt are partially visible. To ensure that model is not overdependent on segmentation mask we enhance model robustness through a mask feature mixup technique. Additionally, we introduce a novel dataset for grading celiac disease, consisting of 750 annotated duodenum biopsy images. MeasureNet achieves an 82.66% classification accuracy for binary classification and 81% accuracy for multi-class grading of celiac disease. Code: https://github.com/dair-iitd/MeasureNet
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Authors:
Ayush Mohanty,
Jason Dekarske,
Stephen K. Robinson,
Sanjay Joshi,
Nagi Gebraeel
Abstract:
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a pro…
▽ More
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Authors:
Angelika Romanou,
Negar Foroutan,
Anna Sotnikova,
Zeming Chen,
Sree Harsha Nelaturu,
Shivalika Singh,
Rishabh Maheshwary,
Micol Altomare,
Mohamed A. Haggag,
Snegha A,
Alfonso Amayuelas,
Azril Hafizi Amirudin,
Viraat Aryabumi,
Danylo Boiko,
Michael Chang,
Jenny Chim,
Gal Cohen,
Aditya Kumar Dalmia,
Abraham Diress,
Sharad Duwal,
Daniil Dzenhaliou,
Daniel Fernando Erazo Florez,
Fabian Farestam,
Joseph Marvin Imperial,
Shayekh Bin Islam
, et al. (34 additional authors not shown)
Abstract:
The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other th…
▽ More
The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. Our novel resource, INCLUDE, is a comprehensive knowledge- and reasoning-centric benchmark across 44 written languages that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed.
△ Less
Submitted 29 November, 2024;
originally announced November 2024.
-
Textured Gaussians for Enhanced 3D Scene Appearance Modeling
Authors:
Brian Chao,
Hung-Yu Tseng,
Lorenzo Porzi,
Chen Gao,
Tuotuo Li,
Qinbo Li,
Ayush Saraf,
Jia-Bin Huang,
Johannes Kopf,
Gordon Wetzstein,
Changil Kim
Abstract:
3D Gaussian Splatting (3DGS) has recently emerged as a state-of-the-art 3D reconstruction and rendering technique due to its high-quality results and fast training and rendering time. However, pixels covered by the same Gaussian are always shaded in the same color up to a Gaussian falloff scaling factor. Furthermore, the finest geometric detail any individual Gaussian can represent is a simple ell…
▽ More
3D Gaussian Splatting (3DGS) has recently emerged as a state-of-the-art 3D reconstruction and rendering technique due to its high-quality results and fast training and rendering time. However, pixels covered by the same Gaussian are always shaded in the same color up to a Gaussian falloff scaling factor. Furthermore, the finest geometric detail any individual Gaussian can represent is a simple ellipsoid. These properties of 3DGS greatly limit the expressivity of individual Gaussian primitives. To address these issues, we draw inspiration from texture and alpha mapping in traditional graphics and integrate it with 3DGS. Specifically, we propose a new generalized Gaussian appearance representation that augments each Gaussian with alpha~(A), RGB, or RGBA texture maps to model spatially varying color and opacity across the extent of each Gaussian. As such, each Gaussian can represent a richer set of texture patterns and geometric structures, instead of just a single color and ellipsoid as in naive Gaussian Splatting. Surprisingly, we found that the expressivity of Gaussians can be greatly improved by using alpha-only texture maps, and further augmenting Gaussians with RGB texture maps achieves the highest expressivity. We validate our method on a wide variety of standard benchmark datasets and our own custom captures at both the object and scene levels. We demonstrate image quality improvements over existing methods while using a similar or lower number of Gaussians.
△ Less
Submitted 28 May, 2025; v1 submitted 27 November, 2024;
originally announced November 2024.
-
Monster radio jet (>66 kpc) observed in quasar at z$\sim$5
Authors:
Anniek J. Gloudemans,
Frits Sweijen,
Leah K. Morabito,
Emanuele Paolo Farina,
Kenneth J. Duncan,
Yuichi Harikane,
Huub J. A. Röttgering,
Aayush Saxena,
Jan-Torge Schindler
Abstract:
We present the discovery of a large extended radio jet associated with the extremely radio-loud quasar J1601+3102 at $z\sim5$ from sub-arcsecond resolution imaging at 144 MHz with the LOFAR International Telescope. These large radio lobes have been argued to remain elusive at $z>4$ due to energy losses in the synchrotron emitting plasma as a result of scattering of the strong CMB at these high red…
▽ More
We present the discovery of a large extended radio jet associated with the extremely radio-loud quasar J1601+3102 at $z\sim5$ from sub-arcsecond resolution imaging at 144 MHz with the LOFAR International Telescope. These large radio lobes have been argued to remain elusive at $z>4$ due to energy losses in the synchrotron emitting plasma as a result of scattering of the strong CMB at these high redshifts. Nonetheless, the 0.3" resolution radio image of J1601+3102 reveals a Northern and Southern radio lobe located at 9 and 57 kpc from the optical quasar, respectively. The measured jet size of 66 kpc makes J1601+3102 the largest extended radio jet at $z>4$ to date. However, it is expected to have an even larger physical size in reality due to projection effects brought about by the viewing angle. Furthermore, we observe the rest-frame UV spectrum of J1601+3102 with Gemini/GNIRS to examine its black hole properties, which results in a mass of 4.5$\times$10$^{8}$ M$_{\odot}$ with an Eddington luminosity ratio of 0.45. The BH mass is relatively low compared to the known high-$z$ quasar population, which suggests that a high BH mass is not strictly necessary to generate a powerful jet. This discovery of the first $\sim100$ kpc radio jet at $z>4$ shows that these objects exist despite energy losses from Inverse Compton scattering and can put invaluable constraints on the formation of the first radio-loud sources in the early Universe.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Non-Local Classical Field Theory with Fractional Operators on $\mathbb{S}^3 \times \mathbb{R}^1$ Space
Authors:
Abhi Savaliya,
Ayush Bidlan
Abstract:
We present a theoretical framework on non-local classical field theory using fractional integrodifferential operators. Due to the lack of easily manageable symmetries in traditional fractional calculus and the difficulties that arise in the formalism of multi-fractional calculus over $\mathbb{R}^{\text{D}}$ space, we introduce a set of new fractional operators over the…
▽ More
We present a theoretical framework on non-local classical field theory using fractional integrodifferential operators. Due to the lack of easily manageable symmetries in traditional fractional calculus and the difficulties that arise in the formalism of multi-fractional calculus over $\mathbb{R}^{\text{D}}$ space, we introduce a set of new fractional operators over the $\mathbb{S}^3 \times \mathbb{R}^1$ space. The redefined fractional integral operator results in the non-trivial measure canonically, and they can account for the spacetime symmetries for the underlying space $\mathbb{S}^3 \times \mathbb{R}^1$ with the Lorentzian signature $(+, -, -, -, -)$. We conclude that the field equation for the non-local classical field can be obtained as the consequence of the optimisation of the action by employing the non-local variations in the field after defining the non-local Lagrangian density, namely, $\mathcal{L}(φ_{a}\left(x\right), \mathbbð^αφ_{a}\left(x\right))$, as the function of the symmetric fractional derivative of the field, e.g. in the context of the kinetic term, and the field itself.
△ Less
Submitted 15 December, 2024; v1 submitted 23 November, 2024;
originally announced November 2024.
-
LoRA-Mini : Adaptation Matrices Decomposition and Selective Training
Authors:
Ayush Singh,
Rajdeep Aher,
Shivank Garg
Abstract:
The rapid advancements in large language models (LLMs) have revolutionized natural language processing, creating an increased need for efficient, task-specific fine-tuning methods. Traditional fine-tuning of LLMs involves updating a large number of parameters, which is computationally expensive and memory-intensive. Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter…
▽ More
The rapid advancements in large language models (LLMs) have revolutionized natural language processing, creating an increased need for efficient, task-specific fine-tuning methods. Traditional fine-tuning of LLMs involves updating a large number of parameters, which is computationally expensive and memory-intensive. Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter-efficient fine-tuning by reducing the number of trainable parameters. However, while LoRA reduces the number of trainable parameters, LoRA modules still create significant storage challenges. We propose LoRA-Mini, an optimized adaptation of LoRA that improves parameter efficiency by splitting low-rank matrices into four parts, with only the two inner matrices being trainable. This approach achieves upto a 20x reduction compared to standard LoRA in the number of trainable parameters while preserving performance levels comparable to standard LoRA, addressing both computational and storage efficiency in LLM fine-tuning.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
Quantum Advantage via Solving Multivariate Quadratics
Authors:
Pierre Briaud,
Riddhi Ghosal,
Aayush Jain,
Paul Lou,
Amit Sahai
Abstract:
In this work, we propose a new way to (non-interactively, verifiably) demonstrate Quantum Advantage by solving the average-case $\mathsf{NP}$ search problem of finding a solution to a system of (underdetermined) multivariate quadratic equations over the finite field $\mathbb{F}_2$ drawn from a specified distribution. In particular, we design a distribution of degree-2 polynomials…
▽ More
In this work, we propose a new way to (non-interactively, verifiably) demonstrate Quantum Advantage by solving the average-case $\mathsf{NP}$ search problem of finding a solution to a system of (underdetermined) multivariate quadratic equations over the finite field $\mathbb{F}_2$ drawn from a specified distribution. In particular, we design a distribution of degree-2 polynomials $\{p_i(x_1,\ldots,x_n)\}_{i\in [m]}$ for $m<n$ over $\mathbb{F}_2$ for which we show that there is a quantum polynomial-time algorithm that simultaneously solves $\{p_i(x_1,\ldots,x_n)=y_i\}_{i\in [m]}$ for a random vector $(y_1,\ldots,y_m)$. On the other hand, while a solution exists with high probability, we conjecture that it is classically hard to find one based on classical cryptanalysis that we provide, including a comprehensive review of all known relevant classical algorithms for solving multivariate quadratics. Our approach proceeds by examining the Yamakawa-Zhandry (FOCS 2022) quantum advantage scheme and replacing the role of the random oracle with our multivariate quadratic equations. Our work therefore gives several new perspectives:
First, our algorithm gives a counterexample to the conventional belief that generic classically hard multivariate quadratic systems are also quantumly hard.
Second, based on cryptanalytic evidence, our work gives an explicit simple replacement for the random oracle from the work of Yamakawa and Zhandry. We show how to instantiate the random oracle with families of just degree two multivariate polynomials over $\mathbb{F}_2$.
△ Less
Submitted 27 November, 2024; v1 submitted 21 November, 2024;
originally announced November 2024.
-
Hitting the slopes: A spectroscopic view of UV continuum slopes of galaxies reveals a reddening at z > 9.5
Authors:
Aayush Saxena,
Alex J. Cameron,
Harley Katz,
Andrew J. Bunker,
Jacopo Chevallard,
Francesco D'Eugenio,
Santiago Arribas,
Rachana Bhatawdekar,
Kristan Boyett,
Phillip A. Cargile,
Stefano Carniani,
Stephane Charlot,
Mirko Curti,
Emma Curtis-Lake,
Kevin Hainline,
Zhiyuan Ji,
Benjamin D. Johnson,
Gareth C. Jones,
Nimisha Kumari,
Isaac Laseter,
Michael V. Maseda,
Brant Robertson,
Charlotte Simmonds,
Sandro Tacchella,
Hannah Ubler
, et al. (4 additional authors not shown)
Abstract:
The UV continuum slope of galaxies, $β$, is a powerful diagnostic. Understanding the redshift evolution of $β$ and its dependence on key galaxy properties can shed light on the evolution of galaxy physical properties over cosmic time. In this study, we present $β$ measurements for 295 spectroscopically confirmed galaxies at $5.5<z<14.3$ selected primarily from JADES, where $β$ has been measured fr…
▽ More
The UV continuum slope of galaxies, $β$, is a powerful diagnostic. Understanding the redshift evolution of $β$ and its dependence on key galaxy properties can shed light on the evolution of galaxy physical properties over cosmic time. In this study, we present $β$ measurements for 295 spectroscopically confirmed galaxies at $5.5<z<14.3$ selected primarily from JADES, where $β$ has been measured from high quality JWST NIRSpec/PRISM spectra. We find a median $β=-2.3$ across our full sample, and find mild increase in blueness of $β$ with increasing redshift and fainter UV magnitudes. Interestingly, we find evidence for the average $β$ at $z > 9.5$ to begin to redden, deviating from the trend observed at $z < 9.5$. By producing stacked spectra in bins of redshift and $β$, we derive trends between $β$ and dust attenuation, metallicity, ionization parameter, and stellar age indicators directly from spectra, finding a lack of dust attenuation to be the dominant driver of bluer $β$ values. We further report six galaxies with $β<-3.0$, which show a range of spectroscopic properties and signs of significant LyC photon leakage. Finally, we show that the redder $β$ values at $z > 9.5$ may require rapid build-up of dust reservoirs in the very early Universe or a significant contribution from the nebular continuum emission to the observed UV spectra, with the nebular continuum fraction depending on the gas temperatures and densities. Our modeling shows that in the absence of dust, nebular emission at $T > 15,000$ K can reproduce the range of $β$ that we see in our sample. Higher gas temperatures driven by hot, massive stars can boost the fraction of nebular continuum emission, potentially explaining the observed $β$ values as well as bright UV magnitudes seen across galaxies at $z > 10$.
△ Less
Submitted 10 December, 2024; v1 submitted 21 November, 2024;
originally announced November 2024.
-
Coloring triangles in graphs
Authors:
Ayush Basu,
Vojtěch Rödl,
Marcelo Sales
Abstract:
We study quantitative aspects of the following fact: For every graph $F$, there exists a graph $G$ with the property that any $2$-coloring of the triangles of $G$ yields an induced copy of $F$, in which all triangles are monochromatic. We define the Ramsey number $R_{\text{ind}}^Δ(F)$ as the smallest size of such a graph $G$. Although this fact has several proofs, all of them provide tower-type bo…
▽ More
We study quantitative aspects of the following fact: For every graph $F$, there exists a graph $G$ with the property that any $2$-coloring of the triangles of $G$ yields an induced copy of $F$, in which all triangles are monochromatic. We define the Ramsey number $R_{\text{ind}}^Δ(F)$ as the smallest size of such a graph $G$. Although this fact has several proofs, all of them provide tower-type bounds. We study the number $R_{\text{ind}}^Δ(F)$ for some particular classes of graphs $F$.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Authors:
Rohith Peddi,
Saurabh,
Ayush Abhay Shrivastava,
Parag Singla,
Vibhav Gogate
Abstract:
Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation of dynamic scenes by modeling objects and their evolving relationships over time. However, real-world visual relationships often exhibit a long-tailed distribution, causing existing methods for tasks like Video Scene Graph Generation (VidSGG) and Scene Graph Anticipation (SGA) to produce biased scene graphs. To th…
▽ More
Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation of dynamic scenes by modeling objects and their evolving relationships over time. However, real-world visual relationships often exhibit a long-tailed distribution, causing existing methods for tasks like Video Scene Graph Generation (VidSGG) and Scene Graph Anticipation (SGA) to produce biased scene graphs. To this end, we propose ImparTail, a novel training framework that leverages loss masking and curriculum learning to mitigate bias in the generation and anticipation of spatio-temporal scene graphs. Unlike prior methods that add extra architectural components to learn unbiased estimators, we propose an impartial training objective that reduces the dominance of head classes during learning and focuses on underrepresented tail relationships. Our curriculum-driven mask generation strategy further empowers the model to adaptively adjust its bias mitigation strategy over time, enabling more balanced and robust estimations. To thoroughly assess performance under various distribution shifts, we also introduce two new tasks Robust Spatio-Temporal Scene Graph Generation and Robust Scene Graph Anticipation offering a challenging benchmark for evaluating the resilience of STSG models. Extensive experiments on the Action Genome dataset demonstrate the superior unbiased performance and robustness of our method compared to existing baselines.
△ Less
Submitted 24 March, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Barttender: An approachable & interpretable way to compare medical imaging and non-imaging data
Authors:
Ayush Singla,
Shakson Isaac,
Chirag J. Patel
Abstract:
Imaging-based deep learning has transformed healthcare research, yet its clinical adoption remains limited due to challenges in comparing imaging models with traditional non-imaging and tabular data. To bridge this gap, we introduce Barttender, an interpretable framework that uses deep learning for the direct comparison of the utility of imaging versus non-imaging tabular data for tasks like disea…
▽ More
Imaging-based deep learning has transformed healthcare research, yet its clinical adoption remains limited due to challenges in comparing imaging models with traditional non-imaging and tabular data. To bridge this gap, we introduce Barttender, an interpretable framework that uses deep learning for the direct comparison of the utility of imaging versus non-imaging tabular data for tasks like disease prediction.
Barttender converts non-imaging tabular features, such as scalar data from electronic health records, into grayscale bars, facilitating an interpretable and scalable deep learning based modeling of both data modalities. Our framework allows researchers to evaluate differences in utility through performance measures, as well as local (sample-level) and global (population-level) explanations. We introduce a novel measure to define global feature importances for image-based deep learning models, which we call gIoU. Experiments on the CheXpert and MIMIC datasets with chest X-rays and scalar data from electronic health records show that Barttender performs comparably to traditional methods and offers enhanced explainability using deep learning models.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Sensor-fusion based Prognostics Framework for Complex Engineering Systems Exhibiting Multiple Failure Modes
Authors:
Benjamin Peters,
Ayush Mohanty,
Xiaolei Fang,
Stephen K. Robinson,
Nagi Gebraeel
Abstract:
Complex engineering systems are often subject to multiple failure modes. Developing a remaining useful life (RUL) prediction model that does not consider the failure mode causing degradation is likely to result in inaccurate predictions. However, distinguishing between causes of failure without manually inspecting the system is nontrivial. This challenge is increased when the causes of historicall…
▽ More
Complex engineering systems are often subject to multiple failure modes. Developing a remaining useful life (RUL) prediction model that does not consider the failure mode causing degradation is likely to result in inaccurate predictions. However, distinguishing between causes of failure without manually inspecting the system is nontrivial. This challenge is increased when the causes of historically observed failures are unknown. Sensors, which are useful for monitoring the state-of-health of systems, can also be used for distinguishing between multiple failure modes as the presence of multiple failure modes results in discriminatory behavior of the sensor signals. When systems are equipped with multiple sensors, some sensors may exhibit behavior correlated with degradation, while other sensors do not. Furthermore, which sensors exhibit this behavior may differ for each failure mode. In this paper, we present a simultaneous clustering and sensor selection approach for unlabeled training datasets of systems exhibiting multiple failure modes. The cluster assignments and the selected sensors are then utilized in real-time to first diagnose the active failure mode and then to predict the system RUL. We validate the methodology using a simulated dataset of systems exhibiting two failure modes and on NASA turbofan degradation dataset.
△ Less
Submitted 10 March, 2025; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Formation of Compact Hierarchical Triples
Authors:
Ayush Moharana,
K. G. Helminiak,
T. Pawar,
G. Pawar
Abstract:
Compact hierarchical triples (CHTs) are triple stars where the tertiary is in an orbit of a period less than 1000 d. They were thought to be rare but we are discovering more of these systems recently, thanks to space-based missions like TESS, Kepler, and GAIA. In this work, we use orbital parameters obtained from these missions to constrain the formation process of CHTs. We also use spectroscopic…
▽ More
Compact hierarchical triples (CHTs) are triple stars where the tertiary is in an orbit of a period less than 1000 d. They were thought to be rare but we are discovering more of these systems recently, thanks to space-based missions like TESS, Kepler, and GAIA. In this work, we use orbital parameters obtained from these missions to constrain the formation process of CHTs. We also use spectroscopic and systemic parameters from our work, and the literature to understand the effects of metallicity and dynamics on the formation processes.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Syllabus: Portable Curricula for Reinforcement Learning Agents
Authors:
Ryan Sullivan,
Ryan Pégoud,
Ameen Ur Rahmen,
Xinchen Yang,
Junyun Huang,
Aayush Verma,
Nistha Mitra,
John P. Dickerson
Abstract:
Curriculum learning has been a quiet yet crucial component of many of the high-profile successes of reinforcement learning. Despite this, none of the major reinforcement learning libraries directly support curriculum learning or include curriculum learning implementations. These methods can improve the capabilities and robustness of RL agents, but often require significant, complex changes to agen…
▽ More
Curriculum learning has been a quiet yet crucial component of many of the high-profile successes of reinforcement learning. Despite this, none of the major reinforcement learning libraries directly support curriculum learning or include curriculum learning implementations. These methods can improve the capabilities and robustness of RL agents, but often require significant, complex changes to agent training code. We introduce Syllabus, a library for training RL agents with curriculum learning, as a solution to this problem. Syllabus provides a universal API for curriculum learning algorithms, implementations of popular curriculum learning methods, and infrastructure for easily integrating them with distributed training code written in nearly any RL library. Syllabus provides a minimal API for each of the core components of curriculum learning, dramatically simplifying the process of designing new algorithms and applying existing algorithms to new environments. We demonstrate that the same Syllabus code can be used to train agents written in multiple different RL libraries on numerous domains. In doing so, we present the first examples of curriculum learning in NetHack and Neural MMO, two of the premier challenges for single-agent and multi-agent RL respectively, achieving strong results compared to state of the art baselines.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
PickScan: Object discovery and reconstruction from handheld interactions
Authors:
Vincent van der Brugge,
Marc Pollefeys,
Joshua B. Tenenbaum,
Ayush Tewari,
Krishna Murthy Jatavallabhula
Abstract:
Reconstructing compositional 3D representations of scenes, where each object is represented with its own 3D model, is a highly desirable capability in robotics and augmented reality. However, most existing methods rely heavily on strong appearance priors for object discovery, therefore only working on those classes of objects on which the method has been trained, or do not allow for object manipul…
▽ More
Reconstructing compositional 3D representations of scenes, where each object is represented with its own 3D model, is a highly desirable capability in robotics and augmented reality. However, most existing methods rely heavily on strong appearance priors for object discovery, therefore only working on those classes of objects on which the method has been trained, or do not allow for object manipulation, which is necessary to scan objects fully and to guide object discovery in challenging scenarios. We address these limitations with a novel interaction-guided and class-agnostic method based on object displacements that allows a user to move around a scene with an RGB-D camera, hold up objects, and finally outputs one 3D model per held-up object. Our main contribution to this end is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects. On a custom-captured dataset, our pipeline discovers manipulated objects with 78.3% precision at 100% recall and reconstructs them with a mean chamfer distance of 0.90cm. Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73% while detecting 99% fewer false positives.
△ Less
Submitted 17 November, 2024;
originally announced November 2024.
-
Multi Scale Graph Neural Network for Alzheimer's Disease
Authors:
Anya Chauhan,
Ayush Noori,
Zhaozhi Li,
Yingnan He,
Michelle M Li,
Marinka Zitnik,
Sudeshna Das
Abstract:
Alzheimer's disease (AD) is a complex, progressive neurodegenerative disorder characterized by extracellular A\b{eta} plaques, neurofibrillary tau tangles, glial activation, and neuronal degeneration, involving multiple cell types and pathways. Current models often overlook the cellular context of these pathways. To address this, we developed a multiscale graph neural network (GNN) model, ALZ PINN…
▽ More
Alzheimer's disease (AD) is a complex, progressive neurodegenerative disorder characterized by extracellular A\b{eta} plaques, neurofibrillary tau tangles, glial activation, and neuronal degeneration, involving multiple cell types and pathways. Current models often overlook the cellular context of these pathways. To address this, we developed a multiscale graph neural network (GNN) model, ALZ PINNACLE, using brain omics data from donors spanning the entire aging to AD spectrum. ALZ PINNACLE is based on the PINNACLE GNN framework, which learns context-aware protein, cell type, and tissue representations within a unified latent space. ALZ PINNACLE was trained on 14,951 proteins, 206,850 protein interactions, 7 cell types, and 48 cell subtypes or states. After pretraining, we investigated the learned embedding of APOE, the largest genetic risk factor for AD, across different cell types. Notably, APOE embeddings showed high similarity in microglial, neuronal, and CD8 cells, suggesting a similar role of APOE in these cell types. Fine tuning the model on AD risk genes revealed cell type contexts predictive of the role of APOE in AD. Our results suggest that ALZ PINNACLE may provide a valuable framework for uncovering novel insights into AD neurobiology.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
Biometrics in Extended Reality: A Review
Authors:
Ayush Agarwal,
Raghavendra Ramachandra,
Sushma Venkatesh,
S. R. Mahadeva Prasanna
Abstract:
In the domain of Extended Reality (XR), particularly Virtual Reality (VR), extensive research has been devoted to harnessing this transformative technology in various real-world applications. However, a critical challenge that must be addressed before unleashing the full potential of XR in practical scenarios is to ensure robust security and safeguard user privacy. This paper presents a systematic…
▽ More
In the domain of Extended Reality (XR), particularly Virtual Reality (VR), extensive research has been devoted to harnessing this transformative technology in various real-world applications. However, a critical challenge that must be addressed before unleashing the full potential of XR in practical scenarios is to ensure robust security and safeguard user privacy. This paper presents a systematic survey of the utility of biometric characteristics applied in the XR environment. To this end, we present a comprehensive overview of the different types of biometric modalities used for authentication and representation of users in a virtual environment. We discuss different biometric vulnerability gateways in general XR systems for the first time in the literature along with taxonomy. A comprehensive discussion on generating and authenticating biometric-based photorealistic avatars in XR environments is presented with a stringent taxonomy. We also discuss the availability of different datasets that are widely employed in evaluating biometric authentication in XR environments together with performance evaluation metrics. Finally, we discuss the open challenges and potential future work that need to be addressed in the field of biometrics in XR.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
The Unintended Carbon Consequences of Bitcoin Mining Bans: A Paradox in Environmental Policy
Authors:
Juan Ignacio Ibañez,
Aayush Ladda,
Paolo Tasca,
Logan Aldred
Abstract:
The environmental impact of Bitcoin mining has become a significant concern, prompting several governments to consider or implement bans on cryptocurrency mining. However, these well-intentioned policies may lead to unintended consequences, notably the redirection of mining activities to regions with higher carbon intensities. This study aims to quantify the environmental effectiveness of Bitcoin…
▽ More
The environmental impact of Bitcoin mining has become a significant concern, prompting several governments to consider or implement bans on cryptocurrency mining. However, these well-intentioned policies may lead to unintended consequences, notably the redirection of mining activities to regions with higher carbon intensities. This study aims to quantify the environmental effectiveness of Bitcoin mining bans by estimating the resultant carbon emissions from displaced mining operations. Our findings indicate that, contrary to policy goals, Bitcoin mining bans in low-emission countries can result in a net increase in global carbon emissions, a form of aggravated carbon leakage. We further explore the policy implications of these results, suggesting that more nuanced approaches may be required to mitigate the environmental impact of cryptocurrency mining effectively. This research contributes to the broader discourse on sustainable cryptocurrency regulation and provides a data-driven foundation for evaluating the true environmental costs of Bitcoin regulatory policies.
△ Less
Submitted 28 October, 2024;
originally announced November 2024.
-
Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation
Authors:
Aayush Shah,
Shankar Jayaratnam
Abstract:
Large language models (LLMs) have demonstrated significant success in natural language processing (NLP) tasks and have shown promising results in other domains such as protein sequence generation. However, there remain salient differences between LLMs used for NLP, which effectively handle multiple tasks and are available in small sizes, and protein language models that are often specialized for s…
▽ More
Large language models (LLMs) have demonstrated significant success in natural language processing (NLP) tasks and have shown promising results in other domains such as protein sequence generation. However, there remain salient differences between LLMs used for NLP, which effectively handle multiple tasks and are available in small sizes, and protein language models that are often specialized for specific tasks and only exist in larger sizes. In this work, we introduce two small protein language models, based on Llama-3-8B and Phi-3-mini, that are capable of both uncontrollable and controllable protein generation. For the uncontrollable generation task, our best model achieves an average pLDDT score of 69.75, demonstrating robust performance in generating viable protein structures. For the controllable generation task, in which the model generates proteins according to properties specified in the prompt, we achieve a remarkable average TM-Score of 0.84, indicating high structural similarity to target proteins. We chose 10 properties, including six classes of enzymes, to extend the capabilities of prior protein language models. Our approach utilizes the Low-Rank Adaptor (LoRA) technique, reducing trainable parameters to just 4% of the original model size, lowering computational requirements. By using a subset of the UniRef50 dataset and small models, we reduced the overall training time by 70% without compromising performance. Notably, Phi-3-mini reduced trainable parameters by 60%, decreasing training cost by 30% compared to Llama 3. Consequently, Phi-3 achieved a comparable TM-Score of 0.81, demonstrating that smaller models can match the performance of larger ones, like Llama 3. We also demonstrate the deployment of our models on the energy efficient ET-SoC-1 chip, significantly improving the TPS/W by a factor of 3.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition
Authors:
Abhisek Ray,
Ayush Raj,
Maheshkumar H. Kolekar
Abstract:
Extracting multiscale contextual information and higher-order correlations among skeleton sequences using Graph Convolutional Networks (GCNs) alone is inadequate for effective action classification. Hypergraph convolution addresses the above issues but cannot harness the long-range dependencies. The transformer proves to be effective in capturing these dependencies and making complex contextual fe…
▽ More
Extracting multiscale contextual information and higher-order correlations among skeleton sequences using Graph Convolutional Networks (GCNs) alone is inadequate for effective action classification. Hypergraph convolution addresses the above issues but cannot harness the long-range dependencies. The transformer proves to be effective in capturing these dependencies and making complex contextual features accessible. We propose an Autoregressive Adaptive HyperGraph Transformer (AutoregAd-HGformer) model for in-phase (autoregressive and discrete) and out-phase (adaptive) hypergraph generation. The vector quantized in-phase hypergraph equipped with powerful autoregressive learned priors produces a more robust and informative representation suitable for hyperedge formation. The out-phase hypergraph generator provides a model-agnostic hyperedge learning technique to align the attributes with input skeleton embedding. The hybrid (supervised and unsupervised) learning in AutoregAd-HGformer explores the action-dependent feature along spatial, temporal, and channel dimensions. The extensive experimental results and ablation study indicate the superiority of our model over state-of-the-art hypergraph architectures on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
△ Less
Submitted 27 February, 2025; v1 submitted 8 November, 2024;
originally announced November 2024.
-
Integrating RIS into HAP Networks for Improved Connectivity
Authors:
Islam M. Tanash,
Ayush Kumar Dwivedi,
Taneli Riihonen
Abstract:
This paper investigates a high-altitude platform (HAP) network enhanced with reconfigurable intelligent surfaces (RISs). The arbitrary placement of HAPs and RISs is modeled using stochastic geometry, specifically as homogeneous Poisson point processes. The HAP--RIS links are assumed to follow Rician fading, while the RIS--user links experience shadowed-Rician fading. The system's coverage probabil…
▽ More
This paper investigates a high-altitude platform (HAP) network enhanced with reconfigurable intelligent surfaces (RISs). The arbitrary placement of HAPs and RISs is modeled using stochastic geometry, specifically as homogeneous Poisson point processes. The HAP--RIS links are assumed to follow Rician fading, while the RIS--user links experience shadowed-Rician fading. The system's coverage probability and ergodic capacity are derived analytically and validated through Monte Carlo simulations. The results highlight significant performance gains and demonstrate the influence of various system parameters and fading conditions. The proposed system has potential for enhancing connectivity and data offloading in practical scenarios.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Morphology of 32 Repeating Fast Radio Burst Sources at Microsecond Time Scales with CHIME/FRB
Authors:
Alice P. Curtin,
Ketan R. Sand,
Ziggy Pleunis,
Naman Jain,
Victoria Kaspi,
Daniele Michilli,
Emmanuel Fonseca,
Kaitlyn Shin,
Kenzie Nimmo,
Charanjot Brar,
Fengqiu Adam Dong,
Gwendolyn M. Eadie,
B. M. Gaensler,
Antonio Herrera-Martin,
Adaeze L. Ibik,
Ronny C. Joseph,
Jane Kaczmarek,
Calvin Leung,
Robert Main,
Kiyoshi W. Masui,
Ryan McKinven,
Juan Mena-Parra,
Cherry Ng,
Ayush Pandhi,
Aaron B. Pearlman
, et al. (5 additional authors not shown)
Abstract:
The Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst (CHIME/FRB) project has discovered the most repeating fast radio burst (FRB) sources of any telescope. However, most of the physical conclusions derived from this sample are based on data with a time resolution of $\sim$1 ms. In this work, we present for the first time a morphological analysis of the raw voltage data for 118 burst…
▽ More
The Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst (CHIME/FRB) project has discovered the most repeating fast radio burst (FRB) sources of any telescope. However, most of the physical conclusions derived from this sample are based on data with a time resolution of $\sim$1 ms. In this work, we present for the first time a morphological analysis of the raw voltage data for 118 bursts from 32 of CHIME/FRB's repeating sources. We do not find any significant correlations amongst fluence, dispersion measure (DM), burst rate, and burst duration. Performing the first large-scale morphological comparison at timescales down to microseconds between our repeating sources and 125 non-repeating FRBs, we find that repeaters are narrower in frequency and broader in duration than non-repeaters, supporting previous findings. However, we find that the duration-normalized sub-burst widths of the two populations are consistent, possibly suggesting a shared physical emission mechanism. Additionally, we find that the spectral fluences of the two are consistent. When combined with the larger bandwidths and previously found larger DMs of non-repeaters, this suggests that non-repeaters may have higher intrinsic specific energies than repeating FRBs. We do not find any consistent increase or decrease in the DM ($\lessapprox 1$ pc cm$^{-3}$ yr$^{-1}$) and scattering timescales ($\lessapprox 2$ ms yr$^{-1}$) of our sources over $\sim2-4$ year periods.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Beyond the Traditional VIX: A Novel Approach to Identifying Uncertainty Shocks in Financial Markets
Authors:
Ayush Jha,
Abootaleb Shirvani,
Svetlozar T. Rachev,
Frank J. Fabozzi
Abstract:
We introduce a new identification strategy for uncertainty shocks to explain macroeconomic volatility in financial markets. The Chicago Board Options Exchange Volatility Index (VIX) measures market expectations of future volatility, but traditional methods based on second-moment shocks and time-varying volatility of the VIX often fail to capture the non-Gaussian, heavy-tailed nature of asset retur…
▽ More
We introduce a new identification strategy for uncertainty shocks to explain macroeconomic volatility in financial markets. The Chicago Board Options Exchange Volatility Index (VIX) measures market expectations of future volatility, but traditional methods based on second-moment shocks and time-varying volatility of the VIX often fail to capture the non-Gaussian, heavy-tailed nature of asset returns. To address this, we construct a revised VIX by fitting a double-subordinated Normal Inverse Gaussian Levy process to S&P 500 option prices, providing a more comprehensive measure of volatility that reflects the extreme movements and heavy tails observed in financial data. Using an axiomatic approach, we introduce a general family of risk-reward ratios, computed with our revised VIX and fitted over a fractional time series to more accurately identify uncertainty shocks in financial markets.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
A High-Resolution, US-scale Digital Similar of Interacting Livestock, Wild Birds, and Human Ecosystems with Applications to Multi-host Epidemic Spread
Authors:
Abhijin Adiga,
Ayush Chopra,
Mandy L. Wilson,
S. S. Ravi,
Dawen Xie,
Samarth Swarup,
Bryan Lewis,
John Barnes,
Ramesh Raskar,
Madhav V. Marathe
Abstract:
One Health issues, such as the spread of highly pathogenic avian influenza~(HPAI), present significant challenges at the human-animal-environmental interface. Recent H5N1 outbreaks underscore the need for comprehensive modeling efforts that capture the complex interactions between various entities in these interconnected ecosystems. To support such efforts, we develop a methodology to construct a…
▽ More
One Health issues, such as the spread of highly pathogenic avian influenza~(HPAI), present significant challenges at the human-animal-environmental interface. Recent H5N1 outbreaks underscore the need for comprehensive modeling efforts that capture the complex interactions between various entities in these interconnected ecosystems. To support such efforts, we develop a methodology to construct a synthetic spatiotemporal gridded dataset of livestock production and processing, human population, and wild birds for the contiguous United States, called a \emph{digital similar}. This representation is a result of fusing diverse datasets using statistical and optimization techniques, followed by extensive verification and validation. The livestock component includes farm-level representations of four major livestock types -- cattle, poultry, swine, and sheep -- including further categorization into subtypes such as dairy cows, beef cows, chickens, turkeys, ducks, etc. Weekly abundance data for wild bird species identified in the transmission of avian influenza are included. Gridded distributions of the human population, along with demographic and occupational features, capture the placement of agricultural workers and the general population. We demonstrate how the digital similar can be applied to evaluate spillover risk to dairy cows and poultry from wild bird population, then validate these results using historical H5N1 incidences. The resulting subtype-specific spatiotemporal risk maps identify hotspots of high risk from H5N1 infected wild bird population to dairy cattle and poultry operations, thus guiding surveillance efforts.
△ Less
Submitted 7 March, 2025; v1 submitted 2 November, 2024;
originally announced November 2024.
-
Re-thinking Richardson-Lucy without Iteration Cutoffs: Physically Motivated Bayesian Deconvolution
Authors:
Zachary H. Hendrix,
Peter T. Brown,
Tim Flanagan,
Douglas P. Shepherd,
Ayush Saurabh,
Steve Pressé
Abstract:
Richardson-Lucy deconvolution is widely used to restore images from degradation caused by the broadening effects of a point spread function and corruption by photon shot noise, in order to recover an underlying object. In practice, this is achieved by iteratively maximizing a Poisson emission likelihood. However, the RL algorithm is known to prefer sparse solutions and overfit noise, leading to hi…
▽ More
Richardson-Lucy deconvolution is widely used to restore images from degradation caused by the broadening effects of a point spread function and corruption by photon shot noise, in order to recover an underlying object. In practice, this is achieved by iteratively maximizing a Poisson emission likelihood. However, the RL algorithm is known to prefer sparse solutions and overfit noise, leading to high-frequency artifacts. The structure of these artifacts is sensitive to the number of RL iterations, and this parameter is typically hand-tuned to achieve reasonable perceptual quality of the inferred object. Overfitting can be mitigated by introducing tunable regularizers or other ad hoc iteration cutoffs in the optimization as otherwise incorporating fully realistic models can introduce computational bottlenecks. To resolve these problems, we present Bayesian deconvolution, a rigorous deconvolution framework that combines a physically accurate image formation model avoiding the challenges inherent to the RL approach. Our approach achieves deconvolution while satisfying the following desiderata:
I deconvolution is performed in the spatial domain (as opposed to the frequency domain) where all known noise sources are accurately modeled and integrated in the spirit of providing full probability distributions over the density of the putative object recovered;
II the probability distribution is estimated without making assumptions on the sparsity or continuity of the underlying object;
III unsupervised inference is performed and converges to a stable solution with no user-dependent parameter tuning or iteration cutoff;
IV deconvolution produces strictly positive solutions; and
V implementation is amenable to fast, parallelizable computation.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Blind Time-of-Flight Imaging: Sparse Deconvolution on the Continuum with Unknown Kernels
Authors:
Ruiming Guo,
Ayush Bhandari
Abstract:
In recent years, computational Time-of-Flight (ToF) imaging has emerged as an exciting and a novel imaging modality that offers new and powerful interpretations of natural scenes, with applications extending to 3D, light-in-flight, and non-line-of-sight imaging. Mathematically, ToF imaging relies on algorithmic super-resolution, as the back-scattered sparse light echoes lie on a finer time resolut…
▽ More
In recent years, computational Time-of-Flight (ToF) imaging has emerged as an exciting and a novel imaging modality that offers new and powerful interpretations of natural scenes, with applications extending to 3D, light-in-flight, and non-line-of-sight imaging. Mathematically, ToF imaging relies on algorithmic super-resolution, as the back-scattered sparse light echoes lie on a finer time resolution than what digital devices can capture. Traditional methods necessitate knowledge of the emitted light pulses or kernels and employ sparse deconvolution to recover scenes. Unlike previous approaches, this paper introduces a novel, blind ToF imaging technique that does not require kernel calibration and recovers sparse spikes on a continuum, rather than a discrete grid. By studying the shared characteristics of various ToF modalities, we capitalize on the fact that most physical pulses approximately satisfy the Strang-Fix conditions from approximation theory. This leads to a new mathematical formulation for sparse super-resolution. Our recovery approach uses an optimization method that is pivoted on an alternating minimization strategy. We benchmark our blind ToF method against traditional kernel calibration methods, which serve as the baseline. Extensive hardware experiments across different ToF modalities demonstrate the algorithmic advantages, flexibility and empirical robustness of our approach. We show that our work facilitates super-resolution in scenarios where distinguishing between closely spaced objects is challenging, while maintaining performance comparable to known kernel situations. Examples of light-in-flight imaging and light-sweep videos highlight the practical benefits of our blind super-resolution method in enhancing the understanding of natural scenes.
△ Less
Submitted 31 October, 2024;
originally announced November 2024.
-
TaxaBind: A Unified Embedding Space for Ecological Applications
Authors:
Srikumar Sastry,
Subash Khanal,
Aayush Dhakal,
Adeel Ahmad,
Nathan Jacobs
Abstract:
We present TaxaBind, a unified embedding space for characterizing any species of interest. TaxaBind is a multimodal embedding space across six modalities: ground-level images of species, geographic location, satellite image, text, audio, and environmental features, useful for solving ecological problems. To learn this joint embedding space, we leverage ground-level images of species as a binding m…
▽ More
We present TaxaBind, a unified embedding space for characterizing any species of interest. TaxaBind is a multimodal embedding space across six modalities: ground-level images of species, geographic location, satellite image, text, audio, and environmental features, useful for solving ecological problems. To learn this joint embedding space, we leverage ground-level images of species as a binding modality. We propose multimodal patching, a technique for effectively distilling the knowledge from various modalities into the binding modality. We construct two large datasets for pretraining: iSatNat with species images and satellite images, and iSoundNat with species images and audio. Additionally, we introduce TaxaBench-8k, a diverse multimodal dataset with six paired modalities for evaluating deep learning models on ecological tasks. Experiments with TaxaBind demonstrate its strong zero-shot and emergent capabilities on a range of tasks including species classification, cross-model retrieval, and audio classification. The datasets and models are made available at https://github.com/mvrl/TaxaBind.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
frb-voe: A Real-time Virtual Observatory Event Alert Service for Fast Radio Bursts
Authors:
Thomas C. Abbott,
Andrew V. Zwaniga,
Charanjot Brar,
Victoria M. Kaspi,
Emily Petroff,
Mohit Bhardwaj,
P. J. Boyle,
Amanda M. Cook,
Ronny C. Joseph,
Kiyoshi W. Masui,
Ayush Pandhi,
Ziggy Pleunis,
Paul Scholz,
Kaitlyn Shin,
Shriharsh Tendulkar
Abstract:
We present frb-voe, a publicly available software package that enables radio observatories to broadcast fast radio burst (FRB) alerts to subscribers through low-latency virtual observatory events (VOEvents). We describe a use-case of frb-voe by the Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst (CHIME/FRB) Collaboration, which has broadcast thousands of FRB alerts to subscribers w…
▽ More
We present frb-voe, a publicly available software package that enables radio observatories to broadcast fast radio burst (FRB) alerts to subscribers through low-latency virtual observatory events (VOEvents). We describe a use-case of frb-voe by the Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst (CHIME/FRB) Collaboration, which has broadcast thousands of FRB alerts to subscribers worldwide. Using this service, observers have daily opportunities to conduct rapid multi-wavelength follow-up observations of new FRB sources. Alerts are distributed as machine-readable reports and as emails containing FRB metadata, and are available to the public within approximately 13 seconds of detection. A sortable database and a downloadable JSON file containing FRB metadata from all broadcast alerts can be found on the CHIME/FRB public webpage. The frb-voe service also provides users with the ability to retrieve FRB names from the Transient Name Server (TNS) through the frb-voe client user interface (CLI). The frb-voe service can act as a foundation on which any observatory that detects FRBs can build its own VOEvent broadcasting service to contribute to the coordinated multi-wavelength follow-up of astrophysical transients.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
Authors:
Ashutosh Chaubey,
Anoubhav Agarwaal,
Sartaki Sinha Roy,
Aayush Agrawal,
Susmita Ghose
Abstract:
Contextual advertising serves ads that are aligned to the content that the user is viewing. The rapid growth of video content on social platforms and streaming services, along with privacy concerns, has increased the need for contextual advertising. Placing the right ad in the right context creates a seamless and pleasant ad viewing experience, resulting in higher audience engagement and, ultimate…
▽ More
Contextual advertising serves ads that are aligned to the content that the user is viewing. The rapid growth of video content on social platforms and streaming services, along with privacy concerns, has increased the need for contextual advertising. Placing the right ad in the right context creates a seamless and pleasant ad viewing experience, resulting in higher audience engagement and, ultimately, better ad monetization. From a technology standpoint, effective contextual advertising requires a video retrieval system capable of understanding complex video content at a very granular level. Current text-to-video retrieval models based on joint multimodal training demand large datasets and computational resources, limiting their practicality and lacking the key functionalities required for ad ecosystem integration. We introduce ContextIQ, a multimodal expert-based video retrieval system designed specifically for contextual advertising. ContextIQ utilizes modality-specific experts-video, audio, transcript (captions), and metadata such as objects, actions, emotion, etc.-to create semantically rich video representations. We show that our system, without joint training, achieves better or comparable results to state-of-the-art models and commercial solutions on multiple text-to-video retrieval benchmarks. Our ablation studies highlight the benefits of leveraging multiple modalities for enhanced video retrieval accuracy instead of using a vision-language model alone. Furthermore, we show how video retrieval systems such as ContextIQ can be used for contextual advertising in an ad ecosystem while also addressing concerns related to brand safety and filtering inappropriate content.
△ Less
Submitted 29 March, 2025; v1 submitted 29 October, 2024;
originally announced October 2024.
-
Are VLMs Really Blind
Authors:
Ayush Singh,
Mansi Gupta,
Shivank Garg
Abstract:
Vision Language Models excel in handling a wide range of complex tasks, including Optical Character Recognition (OCR), Visual Question Answering (VQA), and advanced geometric reasoning. However, these models fail to perform well on low-level basic visual tasks which are especially easy for humans. Our goal in this work was to determine if these models are truly "blind" to geometric reasoning or if…
▽ More
Vision Language Models excel in handling a wide range of complex tasks, including Optical Character Recognition (OCR), Visual Question Answering (VQA), and advanced geometric reasoning. However, these models fail to perform well on low-level basic visual tasks which are especially easy for humans. Our goal in this work was to determine if these models are truly "blind" to geometric reasoning or if there are ways to enhance their capabilities in this area. Our work presents a novel automatic pipeline designed to extract key information from images in response to specific questions. Instead of just relying on direct VQA, we use question-derived keywords to create a caption that highlights important details in the image related to the question. This caption is then used by a language model to provide a precise answer to the question without requiring external fine-tuning.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Unleashing Dynamic Range and Resolution in Unlimited Sensing Framework via Novel Hardware
Authors:
Yuliang Zhu,
Ayush Bhandari
Abstract:
Conventional digitization based on the Shannon-Nyquist method, implemented via analog-to-digital converters (ADCs), faces fundamental limitations. High-dynamic-range (HDR) signals often get clipped or saturated in practice. Given a fixed bit budget, one must choose between minimizing quantization noise or accommodating HDR inputs. The Unlimited Sensing Framework (USF) eliminates saturation by inco…
▽ More
Conventional digitization based on the Shannon-Nyquist method, implemented via analog-to-digital converters (ADCs), faces fundamental limitations. High-dynamic-range (HDR) signals often get clipped or saturated in practice. Given a fixed bit budget, one must choose between minimizing quantization noise or accommodating HDR inputs. The Unlimited Sensing Framework (USF) eliminates saturation by incorporating nonlinear folding in analog hardware, resulting in modulo signals. Quantizing or digitizing modulo signals enables low quantization noise as the modulo representation maps HDR signals into low-dynamic-range (LDR) samples. In the context of USF, the core innovation of this paper is a novel, low-cost, integrator-based efficient modulo ADC hardware implementation that imposes no restrictions on folding rates, enabling significantly HDR capture. The feasibility of this design is demonstrated by hardware experiments showcasing clear advantages across different quantitative performance metrics. These include capturing HDR signals with a 60-fold increase in dynamic range, achieving up to 5 Effective Number of Bits (ENOBs), and improving Signal-to-Noise and Distortion (SINAD) by 30 dB.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
Real-Time Weapon Detection Using YOLOv8 for Enhanced Safety
Authors:
Ayush Thakur,
Akshat Shrivastav,
Rohan Sharma,
Triyank Kumar,
Kabir Puri
Abstract:
This research paper presents the development of an AI model utilizing YOLOv8 for real-time weapon detection, aimed at enhancing safety in public spaces such as schools, airports, and public transportation systems. As incidents of violence continue to rise globally, there is an urgent need for effective surveillance technologies that can quickly identify potential threats. Our approach focuses on l…
▽ More
This research paper presents the development of an AI model utilizing YOLOv8 for real-time weapon detection, aimed at enhancing safety in public spaces such as schools, airports, and public transportation systems. As incidents of violence continue to rise globally, there is an urgent need for effective surveillance technologies that can quickly identify potential threats. Our approach focuses on leveraging advanced deep learning techniques to create a highly accurate and efficient system capable of detecting weapons in real-time video streams. The model was trained on a comprehensive dataset containing thousands of images depicting various types of firearms and edged weapons, ensuring a robust learning process. We evaluated the model's performance using key metrics such as precision, recall, F1-score, and mean Average Precision (mAP) across multiple Intersection over Union (IoU) thresholds, revealing a significant capability to differentiate between weapon and non-weapon classes with minimal error. Furthermore, we assessed the system's operational efficiency, demonstrating that it can process frames at high speeds suitable for real-time applications. The findings indicate that our YOLOv8-based weapon detection model not only contributes to the existing body of knowledge in computer vision but also addresses critical societal needs for improved safety measures in vulnerable environments. By harnessing the power of artificial intelligence, this research lays the groundwork for developing practical solutions that can be deployed in security settings, ultimately enhancing the protective capabilities of law enforcement and public safety agencies.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Peptide-GPT: Generative Design of Peptides using Generative Pre-trained Transformers and Bio-informatic Supervision
Authors:
Aayush Shah,
Chakradhar Guntuboina,
Amir Barati Farimani
Abstract:
In recent years, natural language processing (NLP) models have demonstrated remarkable capabilities in various domains beyond traditional text generation. In this work, we introduce PeptideGPT, a protein language model tailored to generate protein sequences with distinct properties: hemolytic activity, solubility, and non-fouling characteristics. To facilitate a rigorous evaluation of these genera…
▽ More
In recent years, natural language processing (NLP) models have demonstrated remarkable capabilities in various domains beyond traditional text generation. In this work, we introduce PeptideGPT, a protein language model tailored to generate protein sequences with distinct properties: hemolytic activity, solubility, and non-fouling characteristics. To facilitate a rigorous evaluation of these generated sequences, we established a comprehensive evaluation pipeline consisting of ideas from bioinformatics to retain valid proteins with ordered structures. First, we rank the generated sequences based on their perplexity scores, then we filter out those lying outside the permissible convex hull of proteins. Finally, we predict the structure using ESMFold and select the proteins with pLDDT values greater than 70 to ensure ordered structure. The properties of generated sequences are evaluated using task-specific classifiers - PeptideBERT and HAPPENN. We achieved an accuracy of 76.26% in hemolytic, 72.46% in non-hemolytic, 78.84% in non-fouling, and 68.06% in solubility protein generation. Our experimental results demonstrate the effectiveness of PeptideGPT in de novo protein design and underscore the potential of leveraging NLP-based approaches for paving the way for future innovations and breakthroughs in synthetic biology and bioinformatics. Codes, models, and data used in this study are freely available at: https://github.com/aayush-shah14/PeptideGPT.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
Large Language Models for Financial Aid in Financial Time-series Forecasting
Authors:
Md Khairul Islam,
Ayush Karmacharya,
Timothy Sue,
Judy Fox
Abstract:
Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize "predictive analysis", analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and h…
▽ More
Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize "predictive analysis", analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and high dimensional financial information, which hinder the development of effective predictive models that balance accuracy with efficient runtime and memory usage. Pre-trained foundation models are employed to address these challenging tasks. We use state-of-the-art time series models including pre-trained LLMs (GPT-2 as the backbone), transformers, and linear models to demonstrate their ability to outperform traditional approaches, even with minimal ("few-shot") or no fine-tuning ("zero-shot"). Our benchmark study, which includes financial aid with seven other time series tasks, shows the potential of using LLMs for scarce financial datasets.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
Exceptional groups of order $p^6$ for primes $p\geq 5$
Authors:
E. A. O'Brien,
Sunil Kumar Prajapati,
Ayush Udeep
Abstract:
The minimal faithful permutation degree $μ(G)$ of a finite group $G$ is the least integer $n$ such that $G$ is isomorphic to a subgroup of the symmetric group $S_n$. If $G$ has a normal subgroup $N$ such that $μ(G/N) > μ(G)$, then $G$ is exceptional. We prove that the proportion of exceptional groups of order $p^6$ for primes $p \geq 5$ is asymptotically 0. We identify $(11p+107)/2$ such groups an…
▽ More
The minimal faithful permutation degree $μ(G)$ of a finite group $G$ is the least integer $n$ such that $G$ is isomorphic to a subgroup of the symmetric group $S_n$. If $G$ has a normal subgroup $N$ such that $μ(G/N) > μ(G)$, then $G$ is exceptional. We prove that the proportion of exceptional groups of order $p^6$ for primes $p \geq 5$ is asymptotically 0. We identify $(11p+107)/2$ such groups and conjecture that there are no others.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Energy-Optimal Planning of Waypoint-Based UAV Missions -- Does Minimum Distance Mean Minimum Energy?
Authors:
Nicolas Michel,
Ayush Patnaik,
Zhaodan Kong,
Xinfan Lin
Abstract:
Multirotor unmanned aerial vehicle is a prevailing type of aerial robots with wide real-world applications. The energy efficiency of the robot is a critical aspect of its performance, determining the range and duration of the missions that can be performed. This paper studies the energy-optimal planning of the multirotor, which aims at finding the optimal ordering of waypoints with the minimum ene…
▽ More
Multirotor unmanned aerial vehicle is a prevailing type of aerial robots with wide real-world applications. The energy efficiency of the robot is a critical aspect of its performance, determining the range and duration of the missions that can be performed. This paper studies the energy-optimal planning of the multirotor, which aims at finding the optimal ordering of waypoints with the minimum energy consumption for missions in 3D space. The study is performed based on a previously developed model capturing first-principle energy dynamics of the multirotor. We found that in majority of the cases (up to 95%) the solutions of the energy-optimal planning are different from those of the traditional traveling salesman problem which minimizes the total distance. The difference can be as high as 14.9%, with the average at 1.6%-3.3% and 90th percentile at 3.7%-6.5% depending on the range and number of waypoints in the mission. We then identified and explained the key features of the minimum-energy order by correlating to the underlying flight energy dynamics. It is shown that instead of minimizing the distance, coordination of vertical and horizontal motion to promote aerodynamic efficiency is the key to optimizing energy consumption.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
The Interplay Between Physical Activity, Protein Consumption, and Sleep Quality in Muscle Protein Synthesis
Authors:
Ayush Devkota,
Manakamana Gautam,
Uttam Dhakal,
Suman Devkota,
Gaurav Kumar Gupta,
Ujjwal Nepal,
Amey Dinesh Dhuru,
Aniket Kumar Singh
Abstract:
This systematic review examines the synergistic and individual influences of resistance exercise, dietary protein supplementation, and sleep/recovery on muscle protein synthesis (MPS). Electronic databases such as Scopus, Google Scholar, and Web of Science were extensively used. Studies were selected based on relevance to the criteria and were ensured to be directly applicable to the objectives. R…
▽ More
This systematic review examines the synergistic and individual influences of resistance exercise, dietary protein supplementation, and sleep/recovery on muscle protein synthesis (MPS). Electronic databases such as Scopus, Google Scholar, and Web of Science were extensively used. Studies were selected based on relevance to the criteria and were ensured to be directly applicable to the objectives. Research indicates that a protein dose of 20 to 25 grams maximally stimulates MPS post-resistance training. It is observed that physically frail individuals aged 76 to 92 and middle-aged adults aged 62 to 74 have lower mixed muscle protein synthetic rates than individuals aged 20 to 32. High-whey protein and leucine-enriched supplements enhance MPS more efficiently than standard dairy products in older adults engaged in resistance programs. Similarly, protein intake before sleep boosts overnight MPS rates, which helps prevent muscle loss associated with sleep debt, exercise-induced damage, and muscle-wasting conditions like sarcopenia and cachexia. Resistance exercise is a functional intervention to achieve muscular adaptation and improve function. Future research should focus on variables such as fluctuating fitness levels, age groups, genetics, and lifestyle factors to generate more accurate and beneficial results.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
LLC Intra-set Write Balancing
Authors:
Keshav Krishna,
Ayush Verma
Abstract:
The increasing use of Non-Volatile Memory (NVM) in computer architecture has brought about new challenges, one of which is the write endurance problem. Frequent writes to a particular cache cell in NVM can lead to degradation of the memory cell and reduce its lifespan. To solve this problem, we propose a sample-based blocking technique for the Last Level Cache (LLC). Our approach involves defining…
▽ More
The increasing use of Non-Volatile Memory (NVM) in computer architecture has brought about new challenges, one of which is the write endurance problem. Frequent writes to a particular cache cell in NVM can lead to degradation of the memory cell and reduce its lifespan. To solve this problem, we propose a sample-based blocking technique for the Last Level Cache (LLC). Our approach involves defining a threshold value and sampling a subset of cache sets. If the number of writes to a way in a sampled set exceeds the threshold, the way is blocked, and writes are redirected to other ways. We also maintain a history structure to record the number of writes in a set and a PC-Table to use for blocking in unsampled sets. Based on blocking on sampled sets, variance of values stored in history is used to determine whether blocking had a positive impact or not, and on this basis, value corresponding to instruction pointer is incremented or decremented. This value is later used for blocking in unsampled sets. Our results show that our approach significantly balances write traffic to the cache and improves the overall lifespan of the memory cells while having better performance to the base-line system. Our approach can also be applied to other cache hierarchies and NVM technologies to mitigate the problem of write endurance.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Interpolation techniques for reconstructing Galactic Faraday rotation
Authors:
Affan Khadir,
Ayush Pandhi,
Sebastian Hutschenreuter,
Bryan Gaensler,
Shannon Vanderwoude,
Jennifer West,
Shane O'Sullivan
Abstract:
The line-of-sight structure of the Galactic magnetic field (GMF) can be studied using Faraday rotation measure (RM) grids. We analyze how the choice of interpolation kernel can affect the accuracy and reliability of reconstructed RM maps. We test the following kernels: inverse distance weighting (IDW), natural neighbour interpolation (NNI), inverse multiquadric interpolation (IM), thin-plate splin…
▽ More
The line-of-sight structure of the Galactic magnetic field (GMF) can be studied using Faraday rotation measure (RM) grids. We analyze how the choice of interpolation kernel can affect the accuracy and reliability of reconstructed RM maps. We test the following kernels: inverse distance weighting (IDW), natural neighbour interpolation (NNI), inverse multiquadric interpolation (IM), thin-plate spline interpolation (TPS), and a Bayesian rotation measure sky (BRMS); all techniques were tested on two simulated Galactic foreground RMs (one assuming the GMF has patchy structures and the other assuming it has filamentary structures) using magnetohydrodynamic simulations. Both foregrounds were sampled to form RM grids with densities of $\sim$40 sources deg$^{-2}$ and area $\sim$144 deg$^2$. The techniques were tested on data sets with different noise levels and Gaussian random extragalactic RM contributions. The data set that most closely emulates expected data from current surveys, such as the POlarization Sky Survey of the Universe's Magnetism (POSSUM), had extragalactic contributions and a noise standard deviation of $\sim 1.5$ rad m$^{-2}$. For this data set, the accuracy of the techniques for the patchy structures from best to worst was: BRMS, NNI, TPS, IDW and IM; while in the filamentary simulate foreground it was: BRMS, NNI, TPS, and IDW. IDW is the most computationally expensive technique, while TPS and IM are the least expensive. BRMS and NNI have the same, intermediate computational cost. This analysis lays the groundwork for Galactic RM studies with large radio polarization sky surveys, such as POSSUM.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Scaling Wearable Foundation Models
Authors:
Girish Narayanswamy,
Xin Liu,
Kumar Ayush,
Yuzhe Yang,
Xuhai Xu,
Shun Liao,
Jake Garrison,
Shyam Tailor,
Jake Sunshine,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Jiening Zhan,
Mark Malhotra,
Shwetak Patel,
Samy Abdel-Ghaffar,
Daniel McDuff
Abstract:
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre…
▽ More
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Syn2Real Domain Generalization for Underwater Mine-like Object Detection Using Side-Scan Sonar
Authors:
Aayush Agrawal,
Aniruddh Sikdar,
Rajini Makam,
Suresh Sundaram,
Suresh Kumar Besai,
Mahesh Gopi
Abstract:
Underwater mine detection with deep learning suffers from limitations due to the scarcity of real-world data.
This scarcity leads to overfitting, where models perform well on training data but poorly on unseen data. This paper proposes a Syn2Real (Synthetic to Real) domain generalization approach using diffusion models to address this challenge. We demonstrate that synthetic data generated with…
▽ More
Underwater mine detection with deep learning suffers from limitations due to the scarcity of real-world data.
This scarcity leads to overfitting, where models perform well on training data but poorly on unseen data. This paper proposes a Syn2Real (Synthetic to Real) domain generalization approach using diffusion models to address this challenge. We demonstrate that synthetic data generated with noise by DDPM and DDIM models, even if not perfectly realistic, can effectively augment real-world samples for training. The residual noise in the final sampled images improves the model's ability to generalize to real-world data with inherent noise and high variation. The baseline Mask-RCNN model when trained on a combination of synthetic and original training datasets, exhibited approximately a 60% increase in Average Precision (AP) compared to being trained solely on the original training data. This significant improvement highlights the potential of Syn2Real domain generalization for underwater mine detection tasks.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
K-Contact Distance for Noisy Nonhomogeneous Spatial Point Data with application to Repeating Fast Radio Burst sources
Authors:
A. M. Cook,
Dayi Li,
Gwendolyn M. Eadie,
David C. Stenning,
Paul Scholz,
Derek Bingham,
Radu Craiu,
B. M. Gaensler,
Kiyoshi W. Masui,
Ziggy Pleunis,
Antonio Herrera-Martin,
Ronniy C. Joseph,
Ayush Pandhi,
Aaron B. Pearlman,
J. Xavier Prochaska
Abstract:
This paper introduces an approach to analyze nonhomogeneous Poisson processes (NHPP) observed with noise, focusing on previously unstudied second-order characteristics of the noisy process. Utilizing a hierarchical Bayesian model with noisy data, we estimate hyperparameters governing a physically motivated NHPP intensity. Simulation studies demonstrate the reliability of this methodology in accura…
▽ More
This paper introduces an approach to analyze nonhomogeneous Poisson processes (NHPP) observed with noise, focusing on previously unstudied second-order characteristics of the noisy process. Utilizing a hierarchical Bayesian model with noisy data, we estimate hyperparameters governing a physically motivated NHPP intensity. Simulation studies demonstrate the reliability of this methodology in accurately estimating hyperparameters. Leveraging the posterior distribution, we then infer the probability of detecting a certain number of events within a given radius, the $k$-contact distance. We demonstrate our methodology with an application to observations of fast radio bursts (FRBs) detected by the Canadian Hydrogen Intensity Mapping Experiment's FRB Project (CHIME/FRB). This approach allows us to identify repeating FRB sources by bounding or directly simulating the probability of observing $k$ physically independent sources within some radius in the detection domain, or the $\textit{probability of coincidence}$ ($P_{\text{C}}$). The new methodology improves the repeater detection $P_{\text{C}}$ in 86% of cases when applied to the largest sample of previously classified observations, with a median improvement factor (existing metric over $P_{\text{C}}$ from our methodology) of $\sim$ 3000.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
Authors:
Ayush Jain,
Norio Kosaka,
Xinhu Li,
Kyung-Min Kim,
Erdem Bıyık,
Joseph J. Lim
Abstract:
In reinforcement learning, off-policy actor-critic approaches like DDPG and TD3 are based on the deterministic policy gradient. Herein, the Q-function is trained from off-policy environment data and the actor (policy) is trained to maximize the Q-function via gradient ascent. We observe that in complex tasks like dexterous manipulation and restricted locomotion, the Q-value is a complex function o…
▽ More
In reinforcement learning, off-policy actor-critic approaches like DDPG and TD3 are based on the deterministic policy gradient. Herein, the Q-function is trained from off-policy environment data and the actor (policy) is trained to maximize the Q-function via gradient ascent. We observe that in complex tasks like dexterous manipulation and restricted locomotion, the Q-value is a complex function of action, having several local optima or discontinuities. This poses a challenge for gradient ascent to traverse and makes the actor prone to get stuck at local optima. To address this, we introduce a new actor architecture that combines two simple insights: (i) use multiple actors and evaluate the Q-value maximizing action, and (ii) learn surrogates to the Q-function that are simpler to optimize with gradient-based methods. We evaluate tasks such as restricted locomotion, dexterous manipulation, and large discrete-action space recommender systems and show that our actor finds optimal actions more frequently and outperforms alternate actor architectures.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Preliminary Evaluation of an Ultrasound-Guided Robotic System for Autonomous Percutaneous Intervention
Authors:
Pratima Mohan,
Aayush Agrawal,
Niravkumar A. Patel
Abstract:
Cancer cases have been rising globally, resulting in nearly 10 million deaths in 2023. Biopsy, crucial for diagnosis, is often performed under ultrasound (US) guidance, demanding precise hand coordination and cognitive decision-making. Robot-assisted interventions have shown improved accuracy in lesion targeting by addressing challenges such as noisy 2D images and maintaining consistent probe-to-s…
▽ More
Cancer cases have been rising globally, resulting in nearly 10 million deaths in 2023. Biopsy, crucial for diagnosis, is often performed under ultrasound (US) guidance, demanding precise hand coordination and cognitive decision-making. Robot-assisted interventions have shown improved accuracy in lesion targeting by addressing challenges such as noisy 2D images and maintaining consistent probe-to-surface contact. Recent research has focused on fully autonomous robotic US systems to enable standardized diagnostic procedures and reproducible US-guided therapy. This study presents a fully autonomous system for US-guided needle placement capable of performing end-to-end clinical workflow. The system autonomously: 1) identifies the liver region on the patient's abdomen surface, 2) plans and executes the US scanning path using impedance control, 3) localizes lesions from the US images in real-time, and 4) targets the identified lesions, all without human intervention. This study evaluates both position and impedance-controlled systems. Validation on agar phantoms demonstrated a targeting error of 5.74 +- 2.70 mm, highlighting its potential for accurately targeting tumors larger than 5 mm. Achieved results show its potential for a fully autonomous system for US-guided biopsies.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Efficient Federated Unlearning under Plausible Deniability
Authors:
Ayush K. Varshney,
Vicenç Torra
Abstract:
Privacy regulations like the GDPR in Europe and the CCPA in the US allow users the right to remove their data ML applications. Machine unlearning addresses this by modifying the ML parameters in order to forget the influence of a specific data point on its weights. Recent literature has highlighted that the contribution from data point(s) can be forged with some other data points in the dataset wi…
▽ More
Privacy regulations like the GDPR in Europe and the CCPA in the US allow users the right to remove their data ML applications. Machine unlearning addresses this by modifying the ML parameters in order to forget the influence of a specific data point on its weights. Recent literature has highlighted that the contribution from data point(s) can be forged with some other data points in the dataset with probability close to one. This allows a server to falsely claim unlearning without actually modifying the model's parameters. However, in distributed paradigms such as FL, where the server lacks access to the dataset and the number of clients are limited, claiming unlearning in such cases becomes a challenge. This paper introduces an efficient way to achieve federated unlearning, by employing a privacy model which allows the FL server to plausibly deny the client's participation in the training up to a certain extent. We demonstrate that the server can generate a Proof-of-Deniability, where each aggregated update can be associated with at least x number of client updates. This enables the server to plausibly deny a client's participation. However, in the event of frequent unlearning requests, the server is required to adopt an unlearning strategy and, accordingly, update its model parameters. We also perturb the client updates in a cluster in order to avoid inference from an honest but curious server. We show that the global model satisfies differential privacy after T number of communication rounds. The proposed methodology has been evaluated on multiple datasets in different privacy settings. The experimental results show that our framework achieves comparable utility while providing a significant reduction in terms of memory (30 times), as well as retraining time (1.6-500769 times). The source code for the paper is available.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
Authors:
Vinith M. Suriyakumar,
Rohan Alur,
Ayush Sekhari,
Manish Raghavan,
Ashia C. Wilson
Abstract:
Text-to-image diffusion models rely on massive, web-scale datasets. Training them from scratch is computationally expensive, and as a result, developers often prefer to make incremental updates to existing models. These updates often compose fine-tuning steps (to learn new concepts or improve model performance) with "unlearning" steps (to "forget" existing concepts, such as copyrighted works or ex…
▽ More
Text-to-image diffusion models rely on massive, web-scale datasets. Training them from scratch is computationally expensive, and as a result, developers often prefer to make incremental updates to existing models. These updates often compose fine-tuning steps (to learn new concepts or improve model performance) with "unlearning" steps (to "forget" existing concepts, such as copyrighted works or explicit content). In this work, we demonstrate a critical and previously unknown vulnerability that arises in this paradigm: even under benign, non-adversarial conditions, fine-tuning a text-to-image diffusion model on seemingly unrelated images can cause it to "relearn" concepts that were previously "unlearned." We comprehensively investigate the causes and scope of this phenomenon, which we term concept resurgence, by performing a series of experiments which compose "concept unlearning" with subsequent fine-tuning of Stable Diffusion v1.4 and Stable Diffusion v2.1. Our findings underscore the fragility of composing incremental model updates, and raise serious new concerns about current approaches to ensuring the safety and alignment of text-to-image diffusion models.
△ Less
Submitted 10 February, 2025; v1 submitted 10 October, 2024;
originally announced October 2024.