Search | arXiv e-print repository

2COOOL: 2nd Workshop on the Challenge Of Out-Of-Label Hazards in Autonomous Driving

Authors: Ali K. AlShami, Ryan Rabinowitz, Maged Shoman, Jianwu Fang, Lukas Picek, Shao-Yuan Lo, Steve Cruz, Khang Nhut Lam, Nachiket Kamod, Lei-Lei Li, Jugal Kalita, Terrance E. Boult

Abstract: As the computer vision community advances autonomous driving algorithms, integrating vision-based insights with sensor data remains essential for improving perception, decision making, planning, prediction, simulation, and control. Yet we must ask: Why don't we have entirely safe self-driving cars yet? A key part of the answer lies in addressing novel scenarios, one of the most critical barriers t… ▽ More As the computer vision community advances autonomous driving algorithms, integrating vision-based insights with sensor data remains essential for improving perception, decision making, planning, prediction, simulation, and control. Yet we must ask: Why don't we have entirely safe self-driving cars yet? A key part of the answer lies in addressing novel scenarios, one of the most critical barriers to real-world deployment. Our 2COOOL workshop provides a dedicated forum for researchers and industry experts to push the state of the art in novelty handling, including out-of-distribution hazard detection, vision-language models for hazard understanding, new benchmarking and methodologies, and safe autonomous driving practices. The 2nd Workshop on the Challenge of Out-of-Label Hazards in Autonomous Driving (2COOOL) will be held at the International Conference on Computer Vision (ICCV) 2025 in Honolulu, Hawaii, on October 19, 2025. We aim to inspire the development of new algorithms and systems for hazard avoidance, drawing on ideas from anomaly detection, open-set recognition, open-vocabulary modeling, domain adaptation, and related fields. Building on the success of its inaugural edition at the Winter Conference on Applications of Computer Vision (WACV) 2025, the workshop will feature a mix of academic and industry participation. △ Less

Submitted 18 August, 2025; originally announced August 2025.

Comments: 11 pages, 2 figures, Accepted to ICCV 2025 Workshop on Out-of-Label Hazards in Autonomous Driving (2COOOL)

MSC Class: 68T45 (Machine vision and scene understanding) ACM Class: I.2.10; I.4.8

arXiv:2507.00951 [pdf, ps, other]

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Authors: Rizwan Qureshi, Ranjan Sapkota, Abbas Shah, Amgad Muneer, Anas Zafar, Ashmal Vayani, Maged Shoman, Abdelrahman B. M. Eldaly, Kai Zhang, Ferhat Sadak, Shaina Raza, Xinqi Fan, Ravid Shwartz-Ziv, Hong Yan, Vinjia Jain, Aman Chadha, Manoj Karkee, Jia Wu, Seyedali Mirjalili

Abstract: Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level… ▽ More Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI. △ Less

Submitted 11 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

arXiv:2506.07055 [pdf, ps, other]

A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

Authors: Tarique Dahri, Zulfiqar Ali Memon, Zhenyu Yu, Mohd. Yamani Idna Idris, Sheheryar Khan, Sadiq Ahmad, Maged Shoman, Saddam Aziz, Rizwan Qureshi

Abstract: We introduce Layered Self-Supervised Knowledge Distillation (LSSKD) framework for training compact deep learning models. Unlike traditional methods that rely on pre-trained teacher networks, our approach appends auxiliary classifiers to intermediate feature maps, generating diverse self-supervised knowledge and enabling one-to-one transfer across different network stages. Our method achieves an av… ▽ More We introduce Layered Self-Supervised Knowledge Distillation (LSSKD) framework for training compact deep learning models. Unlike traditional methods that rely on pre-trained teacher networks, our approach appends auxiliary classifiers to intermediate feature maps, generating diverse self-supervised knowledge and enabling one-to-one transfer across different network stages. Our method achieves an average improvement of 4.54\% over the state-of-the-art PS-KD method and a 1.14% gain over SSKD on CIFAR-100, with a 0.32% improvement on ImageNet compared to HASSKD. Experiments on Tiny ImageNet and CIFAR-100 under few-shot learning scenarios also achieve state-of-the-art results. These findings demonstrate the effectiveness of our approach in enhancing model generalization and performance without the need for large over-parameterized teacher networks. Importantly, at the inference stage, all auxiliary classifiers can be removed, yielding no extra computational cost. This makes our model suitable for deploying small language models on affordable low-computing devices. Owing to its lightweight design and adaptability, our framework is particularly suitable for multimodal sensing and cyber-physical environments that require efficient and responsive inference. LSSKD facilitates the development of intelligent agents capable of learning from limited sensory data under weak supervision. △ Less

Submitted 8 June, 2025; originally announced June 2025.

arXiv:2502.08650 [pdf, ps, other]

Who is Responsible? The Data, Models, Users or Regulations? A Comprehensive Survey on Responsible Generative AI for a Sustainable Future

Authors: Shaina Raza, Rizwan Qureshi, Anam Zahid, Safiullah Kamawal, Ferhat Sadak, Joseph Fioresi, Muhammaed Saeed, Ranjan Sapkota, Aditya Jain, Anas Zafar, Muneeb Ul Hassan, Aizan Zafar, Hasan Maqbool, Ashmal Vayani, Jia Wu, Maged Shoman

Abstract: Generative AI is moving rapidly from research into real world deployment across sectors, which elevates the need for responsible development, deployment, evaluation, and governance. To address this pressing challenge, in this study, we synthesize the landscape of responsible generative AI across methods, benchmarks, and policies, and connects governance expectations to concrete engineering practic… ▽ More Generative AI is moving rapidly from research into real world deployment across sectors, which elevates the need for responsible development, deployment, evaluation, and governance. To address this pressing challenge, in this study, we synthesize the landscape of responsible generative AI across methods, benchmarks, and policies, and connects governance expectations to concrete engineering practice. We follow a prespecified search and screening protocol focused on post-ChatGPT era with selective inclusion of foundational work for definitions, and we conduct a narrative and thematic synthesis. Three findings emerge; First, benchmark and practice coverage is dense for bias and toxicity but relatively sparse for privacy and provenance, deepfake and media integrity risk, and system level failure in tool using and agentic settings. Second, many evaluations remain static and task local, which limits evidence portability for audit and lifecycle assurance. Third, documentation and metric validity are inconsistent, which complicates comparison across releases and domains. We outline a research and practice agenda that prioritizes adaptive and multimodal evaluation, privacy and provenance testing, deepfake risk assessment, calibration and uncertainty reporting, versioned and documented artifacts, and continuous monitoring. Limitations include reliance on public artifacts and the focus period, which may under represent capabilities reported later. The survey offers a path to align development and evaluation with governance needs and to support safe, transparent, and accountable deployment across domains. Project page: https://anas-zafar.github.io/responsible-ai.github.io , GitHub: https://github.com/anas-zafar/Responsible-AI △ Less

Submitted 24 September, 2025; v1 submitted 15 January, 2025; originally announced February 2025.

Comments: under review

arXiv:2501.18648 [pdf, other]

Multimodal Large Language Models for Image, Text, and Speech Data Augmentation: A Survey

Authors: Ranjan Sapkota, Shaina Raza, Maged Shoman, Achyut Paudel, Manoj Karkee

Abstract: In the past five years, research has shifted from traditional Machine Learning (ML) and Deep Learning (DL) approaches to leveraging Large Language Models (LLMs) , including multimodality, for data augmentation to enhance generalization, and combat overfitting in training deep convolutional neural networks. However, while existing surveys predominantly focus on ML and DL techniques or limited modal… ▽ More In the past five years, research has shifted from traditional Machine Learning (ML) and Deep Learning (DL) approaches to leveraging Large Language Models (LLMs) , including multimodality, for data augmentation to enhance generalization, and combat overfitting in training deep convolutional neural networks. However, while existing surveys predominantly focus on ML and DL techniques or limited modalities (text or images), a gap remains in addressing the latest advancements and multi-modal applications of LLM-based methods. This survey fills that gap by exploring recent literature utilizing multimodal LLMs to augment image, text, and audio data, offering a comprehensive understanding of these processes. We outlined various methods employed in the LLM-based image, text and speech augmentation, and discussed the limitations identified in current approaches. Additionally, we identified potential solutions to these limitations from the literature to enhance the efficacy of data augmentation practices using multimodal LLMs. This survey serves as a foundation for future research, aiming to refine and expand the use of multimodal LLMs in enhancing dataset quality and diversity for deep learning applications. (Surveyed Paper GitHub Repo: https://github.com/WSUAgRobotics/data-aug-multi-modal-llm. Keywords: LLM data augmentation, Grok text data augmentation, DeepSeek image data augmentation, Grok speech data augmentation, GPT audio augmentation, voice augmentation, DeepSeek for data augmentation, DeepSeek R1 text data augmentation, DeepSeek R1 image augmentation, Image Augmentation using LLM, Text Augmentation using LLM, LLM data augmentation for deep learning applications) △ Less

Submitted 21 March, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

Comments: 52 pages

arXiv:2406.19407 [pdf]

doi 10.1007/s10462-025-11253-3

YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series

Authors: Ranjan Sapkota, Marco Flores Calero, Rizwan Qureshi, Chetan Badgujar, Upesh Nepal, Alwin Poulose, Peter Zeno, Uday Bhanu Prakash Vaddevolu, Sheheryar Khan, Maged Shoman, Hong Yan, Manoj Karkee

Abstract: This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to the recently unveiled YOLOv12. Employing a reverse chronological analysis, this study examines the advancements introduced by YOLO algorithms, beginning with YOLOv12 and progressing through YOLO11 (or YOLOv11), YOLOv10, YOLOv9, YOLOv8, and subsequent versions to explore e… ▽ More This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to the recently unveiled YOLOv12. Employing a reverse chronological analysis, this study examines the advancements introduced by YOLO algorithms, beginning with YOLOv12 and progressing through YOLO11 (or YOLOv11), YOLOv10, YOLOv9, YOLOv8, and subsequent versions to explore each version's contributions to enhancing speed, detection accuracy, and computational efficiency in real-time object detection. Additionally, this study reviews the alternative versions derived from YOLO architectural advancements of YOLO-NAS, YOLO-X, YOLO-R, DAMO-YOLO, and Gold-YOLO. Moreover, the study highlights the transformative impact of YOLO models across five critical application areas: autonomous vehicles and traffic safety, healthcare and medical imaging, industrial manufacturing, surveillance and security, and agriculture. By detailing the incremental technological advancements in subsequent YOLO versions, this review chronicles the evolution of YOLO, and discusses the challenges and limitations in each of the earlier versions. The evolution signifies a path towards integrating YOLO with multimodal, context-aware, and Artificial General Intelligence (AGI) systems for the next YOLO decade, promising significant implications for future developments in AI-driven applications. YOLO Review, YOLO Advances, YOLOv13, YOLOv14, YOLOv15, YOLOv16, YOLOv17, YOLOv18, YOLOv19, YOLOv20, YOLO review, YOLO Object Detection △ Less

Submitted 13 June, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Published in Artificial Intelligence Review as https://doi.org/10.1007/s10462-025-11253-3

Journal ref: Artificial Intelligence Review, SpringerNature, 2025

arXiv:2404.10078 [pdf, other]

Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets

Authors: Dai Quoc Tran, Armstrong Aboah, Yuntae Jeon, Maged Shoman, Minsoo Park, Seunghee Park

Abstract: This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle… ▽ More This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle density. Traditional monitoring methods, which rely on static cameras with narrow fields of view, are ineffective in dynamic urban environments, necessitating the installation of multiple cameras, which raises costs. Fisheye lenses, which were recently introduced, provide wide and omnidirectional coverage in a single frame, making them a transformative solution. However, issues such as distorted views and blurriness arise, preventing accurate object detection on these images. Motivated by these challenges, this study proposes a novel approach that combines a ransformer-based image enhancement framework and ensemble learning technique to address these challenges and improve traffic monitoring accuracy, making significant contributions to the future of intelligent traffic management systems. Our proposed methodological framework won 5th place in the 2024 AI City Challenge, Track 4, with an F1 score of 0.5965 on experimental validation data. The experimental results demonstrate the effectiveness, efficiency, and robustness of the proposed system. Our code is publicly available at https://github.com/daitranskku/AIC2024-TRACK4-TEAM15. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.08229 [pdf, other]

Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis

Authors: Maged Shoman, Dongdong Wang, Armstrong Aboah, Mohamed Abdel-Aty

Abstract: This paper introduces our solution for Track 2 in AI City Challenge 2024. The task aims to solve traffic safety description and analysis with the dataset of Woven Traffic Safety (WTS), a real-world Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding. Our solution mainly focuses on the following points: 1) To solve dense video captioning, we leverage the framewo… ▽ More This paper introduces our solution for Track 2 in AI City Challenge 2024. The task aims to solve traffic safety description and analysis with the dataset of Woven Traffic Safety (WTS), a real-world Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding. Our solution mainly focuses on the following points: 1) To solve dense video captioning, we leverage the framework of dense video captioning with parallel decoding (PDVC) to model visual-language sequences and generate dense caption by chapters for video. 2) Our work leverages CLIP to extract visual features to more efficiently perform cross-modality training between visual and textual representations. 3) We conduct domain-specific model adaptation to mitigate domain shift problem that poses recognition challenge in video understanding. 4) Moreover, we leverage BDD-5K captioned videos to conduct knowledge transfer for better understanding WTS videos and more accurate captioning. Our solution has yielded on the test set, achieving 6th place in the competition. The open source code will be available at https://github.com/UCF-SST-Lab/AICity2024CVPRW △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2311.05120 [pdf]

Quranic Conversations: Developing a Semantic Search tool for the Quran using Arabic NLP Techniques

Authors: Yasser Shohoud, Maged Shoman, Sarah Abdelazim

Abstract: The Holy Book of Quran is believed to be the literal word of God (Allah) as revealed to the Prophet Muhammad (PBUH) over a period of approximately 23 years. It is the book where God provides guidance on how to live a righteous and just life, emphasizing principles like honesty, compassion, charity and justice, as well as providing rules for personal conduct, family matters, business ethics and muc… ▽ More The Holy Book of Quran is believed to be the literal word of God (Allah) as revealed to the Prophet Muhammad (PBUH) over a period of approximately 23 years. It is the book where God provides guidance on how to live a righteous and just life, emphasizing principles like honesty, compassion, charity and justice, as well as providing rules for personal conduct, family matters, business ethics and much more. However, due to constraints related to the language and the Quran organization, it is challenging for Muslims to get all relevant ayahs (verses) pertaining to a matter or inquiry of interest. Hence, we developed a Quran semantic search tool which finds the verses pertaining to the user inquiry or prompt. To achieve this, we trained several models on a large dataset of over 30 tafsirs, where typically each tafsir corresponds to one verse in the Quran and, using cosine similarity, obtained the tafsir tensor which is most similar to the prompt tensor of interest, which was then used to index for the corresponding ayah in the Quran. Using the SNxLM model, we were able to achieve a cosine similarity score as high as 0.97 which corresponds to the abdu tafsir for a verse relating to financial matters. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2305.07454 [pdf]

Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs

Authors: Abdul Rashid Mussah, Maged Shoman, Mark Amo-Boateng, Yaw Adu-Gyamfi

Abstract: Real-time traffic and sensor data from connected vehicles have the potential to provide insights that will lead to the immediate benefit of efficient management of the transportation infrastructure and related adjacent services. However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has generated an abundance of CV data and sensor data that has put a strain on the processing c… ▽ More Real-time traffic and sensor data from connected vehicles have the potential to provide insights that will lead to the immediate benefit of efficient management of the transportation infrastructure and related adjacent services. However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has generated an abundance of CV data and sensor data that has put a strain on the processing capabilities of existing data center infrastructure. As a result, the benefits are either delayed or not fully realized. To address this issue, we propose a solution for processing state-wide CV traffic and sensor data on GPUs that provides real-time micro-scale insights in both temporal and spatial dimensions. This is achieved through the use of the Nvidia Rapids framework and the Dask parallel cluster in Python. Our findings demonstrate a 70x acceleration in the extraction, transformation, and loading (ETL) of CV data for the State of Missouri for a full day of all unique CV journeys, reducing the processing time from approximately 48 hours to just 25 minutes. Given that these results are for thousands of CVs and several thousands of individual journeys with sub-second sensor data, implies that we can model and obtain actionable insights for the management of the transportation infrastructure. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: Presented at the 102nd Transportation Research Board (TRB) Annual Meeting

arXiv:2211.08541 [pdf, other]

GC-GRU-N for Traffic Prediction using Loop Detector Data

Authors: Maged Shoman, Armstrong Aboah, Abdulateef Daud, Yaw Adu-Gyamfi

Abstract: Because traffic characteristics display stochastic nonlinear spatiotemporal dependencies, traffic prediction is a challenging task. In this paper develop a graph convolution gated recurrent unit (GC GRU N) network to extract the essential Spatio temporal features. we use Seattle loop detector data aggregated over 15 minutes and reframe the problem through space and time. The model performance is c… ▽ More Because traffic characteristics display stochastic nonlinear spatiotemporal dependencies, traffic prediction is a challenging task. In this paper develop a graph convolution gated recurrent unit (GC GRU N) network to extract the essential Spatio temporal features. we use Seattle loop detector data aggregated over 15 minutes and reframe the problem through space and time. The model performance is compared o benchmark models; Historical Average, Long Short Term Memory (LSTM), and Transformers. The proposed model ranked second with the fastest inference time and a very close performance to first place (Transformers). Our model also achieves a running time that is six times faster than transformers. Finally, we present a comparative study of our model and the available benchmarks using metrics such as training time, inference time, MAPE, MAE and RMSE. Spatial and temporal aspects are also analyzed for each of the trained models. △ Less

Submitted 13 November, 2022; originally announced November 2022.

arXiv:2204.08584 [pdf]

A Region-Based Deep Learning Approach to Automated Retail Checkout

Authors: Maged Shoman, Armstrong Aboah, Alex Morehead, Ye Duan, Abdulateef Daud, Yaw Adu-Gyamfi

Abstract: Automating the product checkout process at conventional retail stores is a task poised to have large impacts on society generally speaking. Towards this end, reliable deep learning models that enable automated product counting for fast customer checkout can make this goal a reality. In this work, we propose a novel, region-based deep learning approach to automate product counting using a customize… ▽ More Automating the product checkout process at conventional retail stores is a task poised to have large impacts on society generally speaking. Towards this end, reliable deep learning models that enable automated product counting for fast customer checkout can make this goal a reality. In this work, we propose a novel, region-based deep learning approach to automate product counting using a customized YOLOv5 object detection pipeline and the DeepSORT algorithm. Our results on challenging, real-world test videos demonstrate that our method can generalize its predictions to a sufficient level of accuracy and with a fast enough runtime to warrant deployment to real-world commercial settings. Our proposed method won 4th place in the 2022 AI City Challenge, Track 4, with an F1 score of 0.4400 on experimental validation data. △ Less

Submitted 18 April, 2022; originally announced April 2022.

arXiv:2201.13284 [pdf]

doi 10.1177/0361198121989726

Exploring Preferences for Transportation Modes in the City of Munich after the Recent Incorporation of Ride-Hailing Companies

Authors: Maged Shoman, Ana Tsui Moreno

Abstract: The growth of ridehailing (RH) companies over the past few years has affected urban mobility in numerous ways. Despite widespread claims about the benefits of such services, limited research has been conducted on the topic. This paper assesses the willingness of Munich transportation users to pay for RH services. Realizing the difficulty of obtaining data directly from RH companies, a stated prefe… ▽ More The growth of ridehailing (RH) companies over the past few years has affected urban mobility in numerous ways. Despite widespread claims about the benefits of such services, limited research has been conducted on the topic. This paper assesses the willingness of Munich transportation users to pay for RH services. Realizing the difficulty of obtaining data directly from RH companies, a stated preference survey was designed. The dataset includes responses from 500 commuters. Sociodemographic attributes, current travel behavior and transportation mode preference in an 8 km trip scenario using RH service and its similar modes (auto and transit), were collected. A multinomial logit model was used to estimate the time and cost coefficients for using RH services across income groups, which was then used to estimate the value of time (VOT) for RH. The model results indicate RH services popularity among those aged 18 to 39, larger households and households with fewer autos. Higher income groups are also willing to pay more for using RH services. To examine the impact of RH services on modal split in the city of Munich, we incorporated RH as a new mode into an existing nested logit mode choice model using an incremental logit. Travel time, travel cost and VOT were used as measures for the choice commuters make when choosing between RH and its closest mode, metro. A total of 20 scenarios were evaluated at four different congestion levels and four price levels to reflect the demand in response to acceptable costs and time tradeoffs. △ Less

Submitted 28 January, 2022; originally announced January 2022.

Report number: Vol. 2675(5) 329--338

Journal ref: Transportation Research Record 2021

arXiv:2104.06856 [pdf]

A Vision-based System for Traffic Anomaly Detection using Deep Learning and Decision Trees

Authors: Armstrong Aboah, Maged Shoman, Vishal Mandal, Sayedomidreza Davami, Yaw Adu-Gyamfi, Anuj Sharma

Abstract: Any intelligent traffic monitoring system must be able to detect anomalies such as traffic accidents in real time. In this paper, we propose a Decision-Tree - enabled approach powered by Deep Learning for extracting anomalies from traffic cameras while accurately estimating the start and end time of the anomalous event. Our approach included creating a detection model, followed by anomaly detectio… ▽ More Any intelligent traffic monitoring system must be able to detect anomalies such as traffic accidents in real time. In this paper, we propose a Decision-Tree - enabled approach powered by Deep Learning for extracting anomalies from traffic cameras while accurately estimating the start and end time of the anomalous event. Our approach included creating a detection model, followed by anomaly detection and analysis. YOLOv5 served as the foundation for our detection model. The anomaly detection and analysis step entail traffic scene background estimation, road mask extraction, and adaptive thresholding. Candidate anomalies were passed through a decision tree to detect and analyze final anomalies. The proposed approach yielded an F1 score of 0.8571, and an S4 score of 0.5686, per the experimental validation. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Showing 1–14 of 14 results for author: Shoman, M