-
Anomaly Detection in Cooperative Vehicle Perception Systems under Imperfect Communication
Authors:
Ashish Bastola,
Hao Wang,
Abolfazl Razi
Abstract:
Anomaly detection is a critical requirement for ensuring safety in autonomous driving. In this work, we leverage Cooperative Perception to share information across nearby vehicles, enabling more accurate identification and consensus of anomalous behaviors in complex traffic scenarios. To account for the real-world challenge of imperfect communication, we propose a cooperative-perception-based anom…
▽ More
Anomaly detection is a critical requirement for ensuring safety in autonomous driving. In this work, we leverage Cooperative Perception to share information across nearby vehicles, enabling more accurate identification and consensus of anomalous behaviors in complex traffic scenarios. To account for the real-world challenge of imperfect communication, we propose a cooperative-perception-based anomaly detection framework (CPAD), which is a robust architecture that remains effective under communication interruptions, thereby facilitating reliable performance even in low-bandwidth settings. Since no multi-agent anomaly detection dataset exists for vehicle trajectories, we introduce 15,000 different scenarios with a 90,000 trajectories benchmark dataset generated through rule-based vehicle dynamics analysis. Empirical results demonstrate that our approach outperforms standard anomaly classification methods in F1-score, AUC and showcase strong robustness to agent connection interruptions.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Diffusion Prism: Enhancing Diversity and Morphology Consistency in Mask-to-Image Diffusion
Authors:
Hao Wang,
Xiwen Chen,
Ashish Bastola,
Jiayou Qin,
Abolfazl Razi
Abstract:
The emergence of generative AI and controllable diffusion has made image-to-image synthesis increasingly practical and efficient. However, when input images exhibit low entropy and sparse, the inherent characteristics of diffusion models often result in limited diversity. This constraint significantly interferes with data augmentation. To address this, we propose Diffusion Prism, a training-free f…
▽ More
The emergence of generative AI and controllable diffusion has made image-to-image synthesis increasingly practical and efficient. However, when input images exhibit low entropy and sparse, the inherent characteristics of diffusion models often result in limited diversity. This constraint significantly interferes with data augmentation. To address this, we propose Diffusion Prism, a training-free framework that efficiently transforms binary masks into realistic and diverse samples while preserving morphological features. We explored that a small amount of artificial noise will significantly assist the image-denoising process. To prove this novel mask-to-image concept, we use nano-dendritic patterns as an example to demonstrate the merit of our method compared to existing controllable diffusion models. Furthermore, we extend the proposed framework to other biological patterns, highlighting its potential applications across various fields.
△ Less
Submitted 10 January, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
RobustFormer: Noise-Robust Pre-training for images and videos
Authors:
Ashish Bastola,
Nishant Luitel,
Hao Wang,
Danda Pani Paudel,
Roshani Poudel,
Abolfazl Razi
Abstract:
While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pr…
▽ More
While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pre-training since the inverse DWT (iDWT) introduced in these approaches is computationally inefficient and lacks compatibility with video inputs in transformer architectures.
In this work, we present RobustFormer, a method that overcomes these limitations by enabling noise-robust pre-training for both images and videos; improving the efficiency of DWT-based methods by removing the need for computationally iDWT steps and simplifying the attention mechanism. To our knowledge, the proposed method is the first DWT-based method compatible with video inputs and masked pre-training. Our experiments show that MAE-based pre-training allows us to bypass the iDWT step, greatly reducing computation. Through extensive tests on benchmark datasets, RobustFormer achieves state-of-the-art results for both image and video tasks.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation
Authors:
Hao Wang,
Jiayou Qin,
Xiwen Chen,
Ashish Bastola,
John Suchanek,
Zihao Gong,
Abolfazl Razi
Abstract:
Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice commands. In complex scenarios where multiple objects are present, it is imperative to prioritize object detection and provide immediate notifications for key entities in specific direc…
▽ More
Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice commands. In complex scenarios where multiple objects are present, it is imperative to prioritize object detection and provide immediate notifications for key entities in specific directions. This brings the need for identifying the observer's motion direction (ego-motion) by merely processing visual information, which is the key contribution of this paper. Specifically, we introduce Motor Focus, a lightweight image-based framework that predicts the ego-motion - the humans (and humanoid machines) movement intentions based on their visual feeds, while filtering out camera motion without any camera calibration. To this end, we implement an optical flow-based pixel-wise temporal analysis method to compensate for the camera motion with a Gaussian aggregation to smooth out the movement prediction area. Subsequently, to evaluate the performance, we collect a dataset including 50 clips of pedestrian scenes in 5 different scenarios. We tested this framework with classical feature detectors such as SIFT and ORB to show the comparison. Our framework demonstrates its superiority in speed (> 40FPS), accuracy (MAE = 60pixels), and robustness (SNR = 23dB), confirming its potential to enhance the usability of vision-based assistive navigation tools in complex environments.
△ Less
Submitted 12 October, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling
Authors:
Ashish Bastola,
Hao Wang,
Xiwen Chen,
Abolfazl Razi
Abstract:
Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. M…
▽ More
Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. Multiple Instance Learning (MIL) alleviates this challenge by operating over labels assigned to the 'bag' of instances. In this paper, we introduce Federated Multiple-Instance Learning (FedMIL). This framework applies federated learning to boost the training performance in video-based MIL tasks such as vehicle accident detection using distributed CCTV networks. However, data sources in decentralized settings are not typically Independently and Identically Distributed (IID), making client selection imperative to collectively represent the entire dataset with minimal clients. To address this challenge, we propose DPPQ, a framework based on the Determinantal Point Process (DPP) with a quality-based kernel to select clients with the most diverse datasets that achieve better performance compared to both random selection and current DPP-based client selection methods even with less data utilization in the majority of non-IID cases. This offers a significant advantage for deployment on edge devices with limited computational resources, providing a reliable solution for training AI models in massive smart sensor networks.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation
Authors:
Hao Wang,
Jiayou Qin,
Ashish Bastola,
Xiwen Chen,
John Suchanek,
Zihao Gong,
Abolfazl Razi
Abstract:
This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered…
▽ More
This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered descriptions emphasizing abnormalities, assist in safe visual navigation in complex circumstances. Moreover, our proposed framework leverages the advantages of LLMs and the open-vocabulary object detection model to achieve the dynamic scenario switch, which allows users to transition smoothly from scene to scene, which addresses the limitation of traditional visual navigation. Furthermore, this paper explored the performance contribution of different prompt components, provided the vision for future improvement in visual accessibility, and paved the way for LLMs in video anomaly detection and vision-language understanding.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
FLAME Diffuser: Wildfire Image Synthesis using Mask Guided Diffusion
Authors:
Hao Wang,
Sayed Pedram Haeri Boroujeni,
Xiwen Chen,
Ashish Bastola,
Huayu Li,
Wenhui Zhu,
Abolfazl Razi
Abstract:
Wildfires are a significant threat to ecosystems and human infrastructure, leading to widespread destruction and environmental degradation. Recent advancements in deep learning and generative models have enabled new methods for wildfire detection and monitoring. However, the scarcity of annotated wildfire images limits the development of robust models for these tasks. In this work, we present the…
▽ More
Wildfires are a significant threat to ecosystems and human infrastructure, leading to widespread destruction and environmental degradation. Recent advancements in deep learning and generative models have enabled new methods for wildfire detection and monitoring. However, the scarcity of annotated wildfire images limits the development of robust models for these tasks. In this work, we present the FLAME Diffuser, a training-free, diffusion-based framework designed to generate realistic wildfire images with paired ground truth. Our framework uses augmented masks, sampled from real wildfire data, and applies Perlin noise to guide the generation of realistic flames. By controlling the placement of these elements within the image, we ensure precise integration while maintaining the original images style. We evaluate the generated images using normalized Frechet Inception Distance, CLIP Score, and a custom CLIP Confidence metric, demonstrating the high quality and realism of the synthesized wildfire images. Specifically, the fusion of Perlin noise in this work significantly improved the quality of synthesized images. The proposed method is particularly valuable for enhancing datasets used in downstream tasks such as wildfire detection and monitoring.
△ Less
Submitted 30 September, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Driving Towards Inclusion: A Systematic Review of AI-powered Accessibility Enhancements for People with Disability in Autonomous Vehicles
Authors:
Ashish Bastola,
Hao Wang,
Sayed Pedram Haeri Boroujeni,
Julian Brinkley,
Ata Jahangir Moshayedi,
Abolfazl Razi
Abstract:
This paper provides a comprehensive and, to our knowledge, the first review of inclusive human-computer interaction (HCI) within autonomous vehicles (AVs) and human-driven cars with partial autonomy, emphasizing accessibility and user-centered design principles. We explore the current technologies and HCI systems designed to enhance passenger experience, particularly for individuals with accessibi…
▽ More
This paper provides a comprehensive and, to our knowledge, the first review of inclusive human-computer interaction (HCI) within autonomous vehicles (AVs) and human-driven cars with partial autonomy, emphasizing accessibility and user-centered design principles. We explore the current technologies and HCI systems designed to enhance passenger experience, particularly for individuals with accessibility needs. Key technologies discussed include brain-computer interfaces, anthropomorphic interaction, virtual reality, augmented reality, mode adaptation, voice-activated interfaces, haptic feedback, etc. Each technology is evaluated for its role in creating an inclusive in-vehicle environment. Furthermore, we highlight recent interface designs by leading companies and review emerging concepts and prototypes under development or testing, which show significant potential to address diverse accessibility requirements. Safety considerations, ethical concerns, and adoption of AVs are other major issues that require thorough investigation. Building on these findings, we propose an end-to-end design framework that addresses accessibility requirements across diverse user demographics, including older adults and individuals with physical or cognitive impairments. This work provides actionable insights for designers, researchers, and policymakers aiming to create safer and more comfortable environments in autonomous and regular vehicles accessible to all users.
△ Less
Submitted 9 January, 2025; v1 submitted 25 January, 2024;
originally announced January 2024.
-
LLM-based Smart Reply (LSR): Enhancing Collaborative Performance with ChatGPT-mediated Smart Reply System
Authors:
Ashish Bastola,
Hao Wang,
Judsen Hembree,
Pooja Yadav,
Zihao Gong,
Emma Dixon,
Abolfazl Razi,
Nathan McNeese
Abstract:
Interactive user interfaces have increasingly explored AI's role in enhancing communication efficiency and productivity in collaborative tasks. The emergence of Large Language Models (LLMs) such as ChatGPT has revolutionized conversational agents, employing advanced deep learning techniques to generate context-aware, coherent, and personalized responses. Consequently, LLM-based AI assistants provi…
▽ More
Interactive user interfaces have increasingly explored AI's role in enhancing communication efficiency and productivity in collaborative tasks. The emergence of Large Language Models (LLMs) such as ChatGPT has revolutionized conversational agents, employing advanced deep learning techniques to generate context-aware, coherent, and personalized responses. Consequently, LLM-based AI assistants provide a more natural and efficient user experience across various scenarios. In this paper, we study how LLM models can be used to improve work efficiency in collaborative workplaces. Specifically, we present an LLM-based Smart Reply (LSR) system utilizing the ChatGPT to generate personalized responses in professional collaborative scenarios while adapting to context and communication style based on prior responses. Our two-step process involves generating a preliminary response type (e.g., Agree, Disagree) to provide a generalized direction for message generation, thus reducing response drafting time. We conducted an experiment where participants completed simulated work tasks involving a Dual N-back test and subtask scheduling through Google Calendar while interacting with co-workers. Our findings indicate that the proposed LSR reduces overall workload, as measured by the NASA TLX, and improves work performance and productivity in the N-back task. We also provide qualitative analysis based on participants' experiences, as well as design considerations to provide future directions for improving such implementations.
△ Less
Submitted 4 March, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.