-
Learning IMU Bias with Diffusion Model
Authors:
Shenghao Zhou,
Saimouli Katragadda,
Guoquan Huang
Abstract:
Motion sensing and tracking with IMU data is essential for spatial intelligence, which however is challenging due to the presence of time-varying stochastic bias. IMU bias is affected by various factors such as temperature and vibration, making it highly complex and difficult to model analytically. Recent data-driven approaches using deep learning have shown promise in predicting bias from IMU rea…
▽ More
Motion sensing and tracking with IMU data is essential for spatial intelligence, which however is challenging due to the presence of time-varying stochastic bias. IMU bias is affected by various factors such as temperature and vibration, making it highly complex and difficult to model analytically. Recent data-driven approaches using deep learning have shown promise in predicting bias from IMU readings. However, these methods often treat the task as a regression problem, overlooking the stochatic nature of bias. In contrast, we model bias, conditioned on IMU readings, as a probabilistic distribution and design a conditional diffusion model to approximate this distribution. Through this approach, we achieve improved performance and make predictions that align more closely with the known behavior of bias.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Denoising Diffusion Probabilistic Models for Coastal Inundation Forecasting
Authors:
Kazi Ashik Islam,
Zakaria Mehrab,
Mahantesh Halappanavar,
Henning Mortveit,
Sridhar Katragadda,
Jon Derek Loftis,
Madhav Marathe
Abstract:
Coastal flooding poses significant risks to communities, necessitating fast and accurate forecasting methods to mitigate potential damage. To approach this problem, we present DIFF-FLOOD, a probabilistic spatiotemporal forecasting method designed based on denoising diffusion models. DIFF-FLOOD predicts inundation level at a location by taking both spatial and temporal context into account. It util…
▽ More
Coastal flooding poses significant risks to communities, necessitating fast and accurate forecasting methods to mitigate potential damage. To approach this problem, we present DIFF-FLOOD, a probabilistic spatiotemporal forecasting method designed based on denoising diffusion models. DIFF-FLOOD predicts inundation level at a location by taking both spatial and temporal context into account. It utilizes inundation levels at neighboring locations and digital elevation data as spatial context. Inundation history from a context time window, together with additional co-variates are used as temporal context. Convolutional neural networks and cross-attention mechanism are then employed to capture the spatiotemporal dynamics in the data. We trained and tested DIFF-FLOOD on coastal inundation data from the Eastern Shore of Virginia, a region highly impacted by coastal flooding. Our results show that, DIFF-FLOOD outperforms existing forecasting methods in terms of prediction performance (6% to 64% improvement in terms of two performance metrics) and scalability.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Online Language Splatting
Authors:
Saimouli Katragadda,
Cho-Ying Wu,
Yuliang Guo,
Xinyu Huang,
Guoquan Huang,
Liu Ren
Abstract:
To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationa…
▽ More
To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, we introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality and open-vocabulary capability. To this end, we innovatively design: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that our online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification
Authors:
Abdullah Mazhar,
Zuhair hasan shaik,
Aseem Srivastava,
Polly Ruhnke,
Lavanya Vaddavalli,
Sri Keshav Katragadda,
Shweta Yadav,
Md Shad Akhtar
Abstract:
The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these…
▽ More
The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these figurative aspects inherent in memes. To address this gap, we introduce a novel dataset, AxiOM, derived from the GAD anxiety questionnaire, which categorizes memes into six fine-grained anxiety symptoms. Next, we propose a commonsense and domain-enriched framework, M3H, to enhance MLMs' ability to interpret figurative language and commonsense knowledge. The overarching goal remains to first understand and then classify the mental health symptoms expressed in memes. We benchmark M3H against 6 competitive baselines (with 20 variations), demonstrating improvements in both quantitative and qualitative metrics, including a detailed human evaluation. We observe a clear improvement of 4.20% and 4.66% on weighted-F1 metric. To assess the generalizability, we perform extensive experiments on a public dataset, RESTORE, for depressive symptom identification, presenting an extensive ablation study that highlights the contribution of each module in both datasets. Our findings reveal limitations in existing models and the advantage of employing commonsense to enhance figurative understanding.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Digital cloning of online social networks for language-sensitive agent-based modeling of misinformation spread
Authors:
Prateek Puri,
Gabriel Hassler,
Anton Shenk,
Sai Katragadda
Abstract:
We develop a simulation framework for studying misinformation spread within online social networks that blends agent-based modeling and natural language processing techniques. While many other agent-based simulations exist in this space, questions over their fidelity and generalization to existing networks in part hinders their ability to provide actionable insights. To partially address these con…
▽ More
We develop a simulation framework for studying misinformation spread within online social networks that blends agent-based modeling and natural language processing techniques. While many other agent-based simulations exist in this space, questions over their fidelity and generalization to existing networks in part hinders their ability to provide actionable insights. To partially address these concerns, we create a 'digital clone' of a known misinformation sharing network by downloading social media histories for over ten thousand of its users. We parse these histories to both extract the structure of the network and model the nuanced ways in which information is shared and spread among its members. Unlike many other agent-based methods in this space, information sharing between users in our framework is sensitive to topic of discussion, user preferences, and online community dynamics. To evaluate the fidelity of our method, we seed our cloned network with a set of posts recorded in the base network and compare propagation dynamics between the two, observing reasonable agreement across the twin networks over a variety of metrics. Lastly, we explore how the cloned network may serve as a flexible, low-cost testbed for misinformation countermeasure evaluation and red teaming analysis. We hope the tools explored here augment existing efforts in the space and unlock new opportunities for misinformation countermeasure evaluation, a field that may become increasingly important to consider with the anticipated rise of misinformation campaigns fueled by generative artificial intelligence.
△ Less
Submitted 23 January, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
One-Class Classification for Intrusion Detection on Vehicular Networks
Authors:
Jake Guidry,
Fahad Sohrab,
Raju Gottumukkala,
Satya Katragadda,
Moncef Gabbouj
Abstract:
Controller Area Network bus systems within vehicular networks are not equipped with the tools necessary to ward off and protect themselves from modern cyber-security threats. Work has been done on using machine learning methods to detect and report these attacks, but common methods are not robust towards unknown attacks. These methods usually rely on there being a sufficient representation of atta…
▽ More
Controller Area Network bus systems within vehicular networks are not equipped with the tools necessary to ward off and protect themselves from modern cyber-security threats. Work has been done on using machine learning methods to detect and report these attacks, but common methods are not robust towards unknown attacks. These methods usually rely on there being a sufficient representation of attack data, which may not be available due to there either not being enough data present to adequately represent its distribution or the distribution itself is too diverse in nature for there to be a sufficient representation of it. With the use of one-class classification methods, this issue can be mitigated as only normal data is required to train a model for the detection of anomalous instances. Research has been done on the efficacy of these methods, most notably One-Class Support Vector Machine and Support Vector Data Description, but many new extensions of these works have been proposed and have yet to be tested for injection attacks in vehicular networks. In this paper, we investigate the performance of various state-of-the-art one-class classification methods for detecting injection attacks on Controller Area Network bus traffic. We investigate the effectiveness of these techniques on attacks launched on Controller Area Network buses from two different vehicles during normal operation and while being attacked. We observe that the Subspace Support Vector Data Description method outperformed all other tested methods with a Gmean of about 85%.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
NeRF-VINS: A Real-time Neural Radiance Field Map-based Visual-Inertial Navigation System
Authors:
Saimouli Katragadda,
Woosik Lee,
Yuxiang Peng,
Patrick Geneva,
Chuchu Chen,
Chao Guo,
Mingyang Li,
Guoquan Huang
Abstract:
Achieving efficient and consistent localization a prior map remains challenging in robotics. Conventional keyframe-based approaches often suffers from sub-optimal viewpoints due to limited field of view (FOV) and/or constrained motion, thus degrading the localization performance. To address this issue, we design a real-time tightly-coupled Neural Radiance Fields (NeRF)-aided visual-inertial naviga…
▽ More
Achieving efficient and consistent localization a prior map remains challenging in robotics. Conventional keyframe-based approaches often suffers from sub-optimal viewpoints due to limited field of view (FOV) and/or constrained motion, thus degrading the localization performance. To address this issue, we design a real-time tightly-coupled Neural Radiance Fields (NeRF)-aided visual-inertial navigation system (VINS). In particular, by effectively leveraging the NeRF's potential to synthesize novel views, the proposed NeRF-VINS overcomes the limitations of traditional keyframe-based maps (with limited views) and optimally fuses IMU, monocular images, and synthetically rendered images within an efficient filter-based framework. This tightly-coupled fusion enables efficient 3D motion tracking with bounded errors. We extensively compare the proposed NeRF-VINS against the state-of-the-art methods that use prior map information and demonstrate its ability to perform real-time localization, at over 10 Hz, on a resource-constrained Jetson AGX Orin embedded platform.
△ Less
Submitted 7 March, 2024; v1 submitted 17 September, 2023;
originally announced September 2023.
-
Active Learning with Combinatorial Coverage
Authors:
Sai Prathyush Katragadda,
Tyler Cody,
Peter Beling,
Laura Freeman
Abstract:
Active learning is a practical field of machine learning that automates the process of selecting which data to label. Current methods are effective in reducing the burden of data labeling but are heavily model-reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment…
▽ More
Active learning is a practical field of machine learning that automates the process of selecting which data to label. Current methods are effective in reducing the burden of data labeling but are heavily model-reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment. We propose active learning methods utilizing combinatorial coverage to overcome these issues. The proposed methods are data-centric, as opposed to model-centric, and through our experiments we show that the inclusion of coverage in active learning leads to sampling data that tends to be the best in transferring to better performing models and has a competitive sampling bias compared to benchmark methods.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
EmpathicSchool: A multimodal dataset for real-time facial expressions and physiological data analysis under different stress conditions
Authors:
Majid Hosseini,
Fahad Sohrab,
Raju Gottumukkala,
Ravi Teja Bhupatiraju,
Satya Katragadda,
Jenni Raitoharju,
Alexandros Iosifidis,
Moncef Gabbouj
Abstract:
Affective computing has garnered researchers' attention and interest in recent years as there is a need for AI systems to better understand and react to human emotions. However, analyzing human emotions, such as mood or stress, is quite complex. While various stress studies use facial expressions and wearables, most existing datasets rely on processing data from a single modality. This paper prese…
▽ More
Affective computing has garnered researchers' attention and interest in recent years as there is a need for AI systems to better understand and react to human emotions. However, analyzing human emotions, such as mood or stress, is quite complex. While various stress studies use facial expressions and wearables, most existing datasets rely on processing data from a single modality. This paper presents EmpathicSchool, a novel dataset that captures facial expressions and the associated physiological signals, such as heart rate, electrodermal activity, and skin temperature, under different stress levels. The data was collected from 20 participants at different sessions for 26 hours. The data includes nine different signal types, including both computer vision and physiological features that can be used to detect stress. In addition, various experiments were conducted to validate the signal quality.
△ Less
Submitted 29 August, 2022;
originally announced September 2022.
-
A multimodal sensor dataset for continuous stress detection of nurses in a hospital
Authors:
Seyedmajid Hosseini,
Satya Katragadda,
Ravi Teja Bhupatiraju,
Ziad Ashkar,
Christoph W. Borst,
Kenneth Cochran,
Raju Gottumukkala
Abstract:
Advances in wearable technologies provide the opportunity to monitor many physiological variables continuously. Stress detection has gained increased attention in recent years, mainly because early stress detection can help individuals better manage health to minimize the negative impacts of long-term stress exposure. This paper provides a unique stress detection dataset created in a natural worki…
▽ More
Advances in wearable technologies provide the opportunity to monitor many physiological variables continuously. Stress detection has gained increased attention in recent years, mainly because early stress detection can help individuals better manage health to minimize the negative impacts of long-term stress exposure. This paper provides a unique stress detection dataset created in a natural working environment in a hospital. This dataset is a collection of biometric data of nurses during the COVID-19 outbreak. Studying stress in a work environment is complex due to many social, cultural, and psychological factors in dealing with stressful conditions. Therefore, we captured both the physiological data and associated context pertaining to the stress events. We monitored specifc physiological variables such as electrodermal activity, Heart Rate, and skin temperature of the nurse subjects. A periodic smartphone-administered survey also captured the contributing factors for the detected stress events. A database containing the signals, stress events, and survey responses is publicly available on Dryad.
△ Less
Submitted 1 June, 2022; v1 submitted 25 July, 2021;
originally announced August 2021.
-
A dual-trap system for the study of charged rotating graphene nanoplatelets in high vacuum
Authors:
Joyce E. Coppock,
Pavel Nagornykh,
Jacob P. J. Murphy,
I. S. McAdams,
Saimouli Katragadda,
B. E. Kane
Abstract:
We discuss the design and implementation of a system for generating charged multilayer graphene nanoplatelets and introducing a nanoplatelet into a quadrupole ion trap in high vacuum. Levitation decouples the platelet from its environment and enables sensitive mechanical and magnetic measurements. The platelets are generated via liquid exfoliation of graphite pellets and charged via electrospray i…
▽ More
We discuss the design and implementation of a system for generating charged multilayer graphene nanoplatelets and introducing a nanoplatelet into a quadrupole ion trap in high vacuum. Levitation decouples the platelet from its environment and enables sensitive mechanical and magnetic measurements. The platelets are generated via liquid exfoliation of graphite pellets and charged via electrospray ionization. A single platelet is trapped at a pressure of several hundred millitorr and transferred to a trap in a second chamber, which is pumped to UHV pressures for further study.
△ Less
Submitted 30 January, 2017;
originally announced January 2017.