Search | arXiv e-print repository

Learning IMU Bias with Diffusion Model

Authors: Shenghao Zhou, Saimouli Katragadda, Guoquan Huang

Abstract: Motion sensing and tracking with IMU data is essential for spatial intelligence, which however is challenging due to the presence of time-varying stochastic bias. IMU bias is affected by various factors such as temperature and vibration, making it highly complex and difficult to model analytically. Recent data-driven approaches using deep learning have shown promise in predicting bias from IMU rea… ▽ More Motion sensing and tracking with IMU data is essential for spatial intelligence, which however is challenging due to the presence of time-varying stochastic bias. IMU bias is affected by various factors such as temperature and vibration, making it highly complex and difficult to model analytically. Recent data-driven approaches using deep learning have shown promise in predicting bias from IMU readings. However, these methods often treat the task as a regression problem, overlooking the stochatic nature of bias. In contrast, we model bias, conditioned on IMU readings, as a probabilistic distribution and design a conditional diffusion model to approximate this distribution. Through this approach, we achieve improved performance and make predictions that align more closely with the known behavior of bias. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: accepted to ICRA 2025

arXiv:2505.05381 [pdf, other]

Denoising Diffusion Probabilistic Models for Coastal Inundation Forecasting

Authors: Kazi Ashik Islam, Zakaria Mehrab, Mahantesh Halappanavar, Henning Mortveit, Sridhar Katragadda, Jon Derek Loftis, Madhav Marathe

Abstract: Coastal flooding poses significant risks to communities, necessitating fast and accurate forecasting methods to mitigate potential damage. To approach this problem, we present DIFF-FLOOD, a probabilistic spatiotemporal forecasting method designed based on denoising diffusion models. DIFF-FLOOD predicts inundation level at a location by taking both spatial and temporal context into account. It util… ▽ More Coastal flooding poses significant risks to communities, necessitating fast and accurate forecasting methods to mitigate potential damage. To approach this problem, we present DIFF-FLOOD, a probabilistic spatiotemporal forecasting method designed based on denoising diffusion models. DIFF-FLOOD predicts inundation level at a location by taking both spatial and temporal context into account. It utilizes inundation levels at neighboring locations and digital elevation data as spatial context. Inundation history from a context time window, together with additional co-variates are used as temporal context. Convolutional neural networks and cross-attention mechanism are then employed to capture the spatiotemporal dynamics in the data. We trained and tested DIFF-FLOOD on coastal inundation data from the Eastern Shore of Virginia, a region highly impacted by coastal flooding. Our results show that, DIFF-FLOOD outperforms existing forecasting methods in terms of prediction performance (6% to 64% improvement in terms of two performance metrics) and scalability. △ Less

Submitted 8 May, 2025; originally announced May 2025.

arXiv:2503.09447 [pdf, other]

Online Language Splatting

Authors: Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Guoquan Huang, Liu Ren

Abstract: To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationa… ▽ More To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, we introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality and open-vocabulary capability. To this end, we innovatively design: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that our online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications. △ Less

Submitted 12 March, 2025; originally announced March 2025.

arXiv:2501.15321 [pdf, other]

Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification

Authors: Abdullah Mazhar, Zuhair hasan shaik, Aseem Srivastava, Polly Ruhnke, Lavanya Vaddavalli, Sri Keshav Katragadda, Shweta Yadav, Md Shad Akhtar

Abstract: The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these… ▽ More The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these figurative aspects inherent in memes. To address this gap, we introduce a novel dataset, AxiOM, derived from the GAD anxiety questionnaire, which categorizes memes into six fine-grained anxiety symptoms. Next, we propose a commonsense and domain-enriched framework, M3H, to enhance MLMs' ability to interpret figurative language and commonsense knowledge. The overarching goal remains to first understand and then classify the mental health symptoms expressed in memes. We benchmark M3H against 6 competitive baselines (with 20 variations), demonstrating improvements in both quantitative and qualitative metrics, including a detailed human evaluation. We observe a clear improvement of 4.20% and 4.66% on weighted-F1 metric. To assess the generalizability, we perform extensive experiments on a public dataset, RESTORE, for depressive symptom identification, presenting an extensive ablation study that highlights the contribution of each module in both datasets. Our findings reveal limitations in existing models and the advantage of employing commonsense to enhance figurative understanding. △ Less

Submitted 25 January, 2025; originally announced January 2025.

Comments: Accepted for oral presentation at The Web Conference (WWW) 2025

arXiv:2401.12509 [pdf]

Digital cloning of online social networks for language-sensitive agent-based modeling of misinformation spread

Authors: Prateek Puri, Gabriel Hassler, Anton Shenk, Sai Katragadda

Abstract: We develop a simulation framework for studying misinformation spread within online social networks that blends agent-based modeling and natural language processing techniques. While many other agent-based simulations exist in this space, questions over their fidelity and generalization to existing networks in part hinders their ability to provide actionable insights. To partially address these con… ▽ More We develop a simulation framework for studying misinformation spread within online social networks that blends agent-based modeling and natural language processing techniques. While many other agent-based simulations exist in this space, questions over their fidelity and generalization to existing networks in part hinders their ability to provide actionable insights. To partially address these concerns, we create a 'digital clone' of a known misinformation sharing network by downloading social media histories for over ten thousand of its users. We parse these histories to both extract the structure of the network and model the nuanced ways in which information is shared and spread among its members. Unlike many other agent-based methods in this space, information sharing between users in our framework is sensitive to topic of discussion, user preferences, and online community dynamics. To evaluate the fidelity of our method, we seed our cloned network with a set of posts recorded in the base network and compare propagation dynamics between the two, observing reasonable agreement across the twin networks over a variety of metrics. Lastly, we explore how the cloned network may serve as a flexible, low-cost testbed for misinformation countermeasure evaluation and red teaming analysis. We hope the tools explored here augment existing efforts in the space and unlock new opportunities for misinformation countermeasure evaluation, a field that may become increasingly important to consider with the anticipated rise of misinformation campaigns fueled by generative artificial intelligence. △ Less

Submitted 23 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2309.14134 [pdf, other]

One-Class Classification for Intrusion Detection on Vehicular Networks

Authors: Jake Guidry, Fahad Sohrab, Raju Gottumukkala, Satya Katragadda, Moncef Gabbouj

Abstract: Controller Area Network bus systems within vehicular networks are not equipped with the tools necessary to ward off and protect themselves from modern cyber-security threats. Work has been done on using machine learning methods to detect and report these attacks, but common methods are not robust towards unknown attacks. These methods usually rely on there being a sufficient representation of atta… ▽ More Controller Area Network bus systems within vehicular networks are not equipped with the tools necessary to ward off and protect themselves from modern cyber-security threats. Work has been done on using machine learning methods to detect and report these attacks, but common methods are not robust towards unknown attacks. These methods usually rely on there being a sufficient representation of attack data, which may not be available due to there either not being enough data present to adequately represent its distribution or the distribution itself is too diverse in nature for there to be a sufficient representation of it. With the use of one-class classification methods, this issue can be mitigated as only normal data is required to train a model for the detection of anomalous instances. Research has been done on the efficacy of these methods, most notably One-Class Support Vector Machine and Support Vector Data Description, but many new extensions of these works have been proposed and have yet to be tested for injection attacks in vehicular networks. In this paper, we investigate the performance of various state-of-the-art one-class classification methods for detecting injection attacks on Controller Area Network bus traffic. We investigate the effectiveness of these techniques on attacks launched on Controller Area Network buses from two different vehicles during normal operation and while being attacked. We observe that the Subspace Support Vector Data Description method outperformed all other tested methods with a Gmean of about 85%. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 7 pages, 2 figures, 4 tables. Accepted at IEEE Symposium Series on Computational Intelligence 2023

arXiv:2309.09295 [pdf, other]

NeRF-VINS: A Real-time Neural Radiance Field Map-based Visual-Inertial Navigation System

Authors: Saimouli Katragadda, Woosik Lee, Yuxiang Peng, Patrick Geneva, Chuchu Chen, Chao Guo, Mingyang Li, Guoquan Huang

Abstract: Achieving efficient and consistent localization a prior map remains challenging in robotics. Conventional keyframe-based approaches often suffers from sub-optimal viewpoints due to limited field of view (FOV) and/or constrained motion, thus degrading the localization performance. To address this issue, we design a real-time tightly-coupled Neural Radiance Fields (NeRF)-aided visual-inertial naviga… ▽ More Achieving efficient and consistent localization a prior map remains challenging in robotics. Conventional keyframe-based approaches often suffers from sub-optimal viewpoints due to limited field of view (FOV) and/or constrained motion, thus degrading the localization performance. To address this issue, we design a real-time tightly-coupled Neural Radiance Fields (NeRF)-aided visual-inertial navigation system (VINS). In particular, by effectively leveraging the NeRF's potential to synthesize novel views, the proposed NeRF-VINS overcomes the limitations of traditional keyframe-based maps (with limited views) and optimally fuses IMU, monocular images, and synthetically rendered images within an efficient filter-based framework. This tightly-coupled fusion enables efficient 3D motion tracking with bounded errors. We extensively compare the proposed NeRF-VINS against the state-of-the-art methods that use prior map information and demonstrate its ability to perform real-time localization, at over 10 Hz, on a resource-constrained Jetson AGX Orin embedded platform. △ Less

Submitted 7 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: 6 pages, 7 figures

arXiv:2302.14567 [pdf, other]

Active Learning with Combinatorial Coverage

Authors: Sai Prathyush Katragadda, Tyler Cody, Peter Beling, Laura Freeman

Abstract: Active learning is a practical field of machine learning that automates the process of selecting which data to label. Current methods are effective in reducing the burden of data labeling but are heavily model-reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment… ▽ More Active learning is a practical field of machine learning that automates the process of selecting which data to label. Current methods are effective in reducing the burden of data labeling but are heavily model-reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment. We propose active learning methods utilizing combinatorial coverage to overcome these issues. The proposed methods are data-centric, as opposed to model-centric, and through our experiments we show that the inclusion of coverage in active learning leads to sampling data that tends to be the best in transferring to better performing models and has a competitive sampling bias compared to benchmark methods. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Accepted 2022 IEEE International Conference on Machine Learning and Applications (IEEE ICMLA)

arXiv:2209.13542 [pdf, other]

EmpathicSchool: A multimodal dataset for real-time facial expressions and physiological data analysis under different stress conditions

Authors: Majid Hosseini, Fahad Sohrab, Raju Gottumukkala, Ravi Teja Bhupatiraju, Satya Katragadda, Jenni Raitoharju, Alexandros Iosifidis, Moncef Gabbouj

Abstract: Affective computing has garnered researchers' attention and interest in recent years as there is a need for AI systems to better understand and react to human emotions. However, analyzing human emotions, such as mood or stress, is quite complex. While various stress studies use facial expressions and wearables, most existing datasets rely on processing data from a single modality. This paper prese… ▽ More Affective computing has garnered researchers' attention and interest in recent years as there is a need for AI systems to better understand and react to human emotions. However, analyzing human emotions, such as mood or stress, is quite complex. While various stress studies use facial expressions and wearables, most existing datasets rely on processing data from a single modality. This paper presents EmpathicSchool, a novel dataset that captures facial expressions and the associated physiological signals, such as heart rate, electrodermal activity, and skin temperature, under different stress levels. The data was collected from 20 participants at different sessions for 26 hours. The data includes nine different signal types, including both computer vision and physiological features that can be used to detect stress. In addition, various experiments were conducted to validate the signal quality. △ Less

Submitted 29 August, 2022; originally announced September 2022.

arXiv:2108.07689 [pdf]

A multimodal sensor dataset for continuous stress detection of nurses in a hospital

Authors: Seyedmajid Hosseini, Satya Katragadda, Ravi Teja Bhupatiraju, Ziad Ashkar, Christoph W. Borst, Kenneth Cochran, Raju Gottumukkala

Abstract: Advances in wearable technologies provide the opportunity to monitor many physiological variables continuously. Stress detection has gained increased attention in recent years, mainly because early stress detection can help individuals better manage health to minimize the negative impacts of long-term stress exposure. This paper provides a unique stress detection dataset created in a natural worki… ▽ More Advances in wearable technologies provide the opportunity to monitor many physiological variables continuously. Stress detection has gained increased attention in recent years, mainly because early stress detection can help individuals better manage health to minimize the negative impacts of long-term stress exposure. This paper provides a unique stress detection dataset created in a natural working environment in a hospital. This dataset is a collection of biometric data of nurses during the COVID-19 outbreak. Studying stress in a work environment is complex due to many social, cultural, and psychological factors in dealing with stressful conditions. Therefore, we captured both the physiological data and associated context pertaining to the stress events. We monitored specifc physiological variables such as electrodermal activity, Heart Rate, and skin temperature of the nurse subjects. A periodic smartphone-administered survey also captured the contributing factors for the detected stress events. A database containing the signals, stress events, and survey responses is publicly available on Dryad. △ Less

Submitted 1 June, 2022; v1 submitted 25 July, 2021; originally announced August 2021.

Comments: 14 pages, 9 images

arXiv:1701.08752 [pdf, other]

doi 10.1364/JOSAB.34.000C36

A dual-trap system for the study of charged rotating graphene nanoplatelets in high vacuum

Authors: Joyce E. Coppock, Pavel Nagornykh, Jacob P. J. Murphy, I. S. McAdams, Saimouli Katragadda, B. E. Kane

Abstract: We discuss the design and implementation of a system for generating charged multilayer graphene nanoplatelets and introducing a nanoplatelet into a quadrupole ion trap in high vacuum. Levitation decouples the platelet from its environment and enables sensitive mechanical and magnetic measurements. The platelets are generated via liquid exfoliation of graphite pellets and charged via electrospray i… ▽ More We discuss the design and implementation of a system for generating charged multilayer graphene nanoplatelets and introducing a nanoplatelet into a quadrupole ion trap in high vacuum. Levitation decouples the platelet from its environment and enables sensitive mechanical and magnetic measurements. The platelets are generated via liquid exfoliation of graphite pellets and charged via electrospray ionization. A single platelet is trapped at a pressure of several hundred millitorr and transferred to a trap in a second chamber, which is pumped to UHV pressures for further study. △ Less

Submitted 30 January, 2017; originally announced January 2017.

Showing 1–11 of 11 results for author: Katragadda, S