-
Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection
Authors:
Mohamed Afifi,
Mohamed ElHelw
Abstract:
Perception is a key element for enabling intelligent autonomous navigation. Understanding the semantics of the surrounding environment and accurate vehicle pose estimation are essential capabilities for autonomous vehicles, including self-driving cars and mobile robots that perform complex tasks. Fast moving platforms like self-driving cars impose a hard challenge for localization and mapping algo…
▽ More
Perception is a key element for enabling intelligent autonomous navigation. Understanding the semantics of the surrounding environment and accurate vehicle pose estimation are essential capabilities for autonomous vehicles, including self-driving cars and mobile robots that perform complex tasks. Fast moving platforms like self-driving cars impose a hard challenge for localization and mapping algorithms. In this work, we propose a novel framework for real-time LiDAR odometry and mapping based on LOAM architecture for fast moving platforms. Our framework utilizes semantic information produced by a deep learning model to improve point-to-line and point-to-plane matching between LiDAR scans and build a semantic map of the environment, leading to more accurate motion estimation using LiDAR data. We observe that including semantic information in the matching process introduces a new type of outlier matches to the process, where matching occur between different objects of the same semantic class. To this end, we propose a novel algorithm that explicitly identifies and discards potential outliers in the matching process. In our experiments, we study the effect of improving the matching process on the robustness of LiDAR odometry against high speed motion. Our experimental evaluations on KITTI dataset demonstrate that utilizing semantic information and rejecting outliers significantly enhance the robustness of LiDAR odometry and mapping when there are large gaps between scan acquisition poses, which is typical for fast moving platforms.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Rice Plant Disease Detection and Diagnosis using Deep Convolutional Neural Networks and Multispectral Imaging
Authors:
Yara Ali Alnaggar,
Ahmad Sebaq,
Karim Amer,
ElSayed Naeem,
Mohamed Elhelw
Abstract:
Rice is considered a strategic crop in Egypt as it is regularly consumed in the Egyptian people's diet. Even though Egypt is the highest rice producer in Africa with a share of 6 million tons per year, it still imports rice to satisfy its local needs due to production loss, especially due to rice disease. Rice blast disease is responsible for 30% loss in rice production worldwide. Therefore, it is…
▽ More
Rice is considered a strategic crop in Egypt as it is regularly consumed in the Egyptian people's diet. Even though Egypt is the highest rice producer in Africa with a share of 6 million tons per year, it still imports rice to satisfy its local needs due to production loss, especially due to rice disease. Rice blast disease is responsible for 30% loss in rice production worldwide. Therefore, it is crucial to target limiting yield damage by detecting rice crops diseases in its early stages. This paper introduces a public multispectral and RGB images dataset and a deep learning pipeline for rice plant disease detection using multi-modal data. The collected multispectral images consist of Red, Green and Near-Infrared channels and we show that using multispectral along with RGB channels as input archives a higher F1 accuracy compared to using RGB input only.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
RSDiff: Remote Sensing Image Generation from Text Using Diffusion Model
Authors:
Ahmad Sebaq,
Mohamed ElHelw
Abstract:
The generation and enhancement of satellite imagery are critical in remote sensing, requiring high-quality, detailed images for accurate analysis. This research introduces a two-stage diffusion model methodology for synthesizing high-resolution satellite images from textual prompts. The pipeline comprises a Low-Resolution Diffusion Model (LRDM) that generates initial images based on text inputs an…
▽ More
The generation and enhancement of satellite imagery are critical in remote sensing, requiring high-quality, detailed images for accurate analysis. This research introduces a two-stage diffusion model methodology for synthesizing high-resolution satellite images from textual prompts. The pipeline comprises a Low-Resolution Diffusion Model (LRDM) that generates initial images based on text inputs and a Super-Resolution Diffusion Model (SRDM) that refines these images into high-resolution outputs. The LRDM merges text and image embeddings within a shared latent space, capturing essential scene content and structure. The SRDM then enhances these images, focusing on spatial features and visual clarity. Experiments conducted using the Remote Sensing Image Captioning Dataset (RSICD) demonstrate that our method outperforms existing models, producing satellite images with accurate geographical details and improved spatial resolution.
△ Less
Submitted 5 October, 2024; v1 submitted 3 September, 2023;
originally announced September 2023.
-
STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map
Authors:
Ammar Sherif,
Abubakar Abid,
Mustafa Elattar,
Mohamed ElHelw
Abstract:
Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference b…
▽ More
Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference between tasks. That is why existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on a re-proposed data-driven features, Data Maps, which capture the training dynamics for each classification task during the MTL training. Through a theoretical comparison with other techniques, we manage to show that our approach has the superior scalability. Our experiments show a better performance and verify the method's effectiveness, even on an unprecedented number of tasks (up to 100 tasks on CIFAR100). Being the first to work on such number of tasks, our comparisons on the resulting grouping shows similar grouping to the mentioned in the dataset, CIFAR100. Finally, we provide a modular implementation for easier integration and testing, with examples from multiple datasets and tasks.
△ Less
Submitted 26 May, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Robust Real-Time Pedestrian Detection on Embedded Devices
Authors:
Mohamed Afifi,
Yara Ali,
Karim Amer,
Mahmoud Shaker,
Mohamed Elhelw
Abstract:
Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However, the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances as well as the need for lightweight algorithms suitable for embedded s…
▽ More
Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However, the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances as well as the need for lightweight algorithms suitable for embedded systems. This paper proposes a robust framework for pedestrian detection in many footages. The framework performs fine and coarse detections on different image regions and exploits temporal and spatial characteristics to attain enhanced accuracy and real time performance on embedded boards. The framework uses the Yolo-v3 object detection [1] as its backbone detector and runs on the Nvidia Jetson TX2 embedded board, however other detectors and/or boards can be used as well. The performance of the framework is demonstrated on two established datasets and its achievement of the second place in CVPR 2019 Embedded Real-Time Inference (ERTI) Challenge.
△ Less
Submitted 13 December, 2020;
originally announced December 2020.
-
Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds
Authors:
Yara Ali Alnaggar,
Mohamed Afifi,
Karim Amer,
Mohamed Elhelw
Abstract:
Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of poin…
▽ More
Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of point clouds as input for 2D fully convolutional neural networks to balance the accuracy-speed trade-off. This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud to mitigate the loss of information inherent in single projection methods. Our Multi-Projection Fusion (MPF) framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models then combines the segmentation results of both views. The proposed framework is validated on the SemanticKITTI dataset where it achieved a mIoU of 55.5 which is higher than state-of-the-art projection-based methods RangeNet++ and PolarNet while being 1.6x faster than the former and 3.1x faster than the latter.
△ Less
Submitted 6 November, 2020; v1 submitted 3 November, 2020;
originally announced November 2020.
-
Overview of Surgical Simulation
Authors:
Mohamed A. ElHelw
Abstract:
Motivated by the current demand of clinical governance, surgical simulation is now a well-established modality for basic skills training and assessment. The practical deployment of the technique is a multi-disciplinary venture encompassing areas in engineering, medicine and psychology. This paper provides an overview of the key topics involved in surgical simulation and associated technical challe…
▽ More
Motivated by the current demand of clinical governance, surgical simulation is now a well-established modality for basic skills training and assessment. The practical deployment of the technique is a multi-disciplinary venture encompassing areas in engineering, medicine and psychology. This paper provides an overview of the key topics involved in surgical simulation and associated technical challenges. The paper discusses the clinical motivation for surgical simulation, the use of virtual environments for surgical training, model acquisition and simplification, deformable models, collision detection, tissue property measurement, haptic rendering and image synthesis. Additional topics include surgical skill training and assessment metrics as well as challenges facing the incorporation of surgical simulation into medical education curricula.
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
Robust Real-time Pedestrian Detection in Aerial Imagery on Jetson TX2
Authors:
Mohamed Afifi,
Yara Ali,
Karim Amer,
Mahmoud Shaker,
Mohamed ElHelw
Abstract:
Detection of pedestrians in aerial imagery captured by drones has many applications including intersection monitoring, patrolling, and surveillance, to name a few. However, the problem is involved due to continuouslychanging camera viewpoint and object appearance as well as the need for lightweight algorithms to run on on-board embedded systems. To address this issue, the paper proposes a framewor…
▽ More
Detection of pedestrians in aerial imagery captured by drones has many applications including intersection monitoring, patrolling, and surveillance, to name a few. However, the problem is involved due to continuouslychanging camera viewpoint and object appearance as well as the need for lightweight algorithms to run on on-board embedded systems. To address this issue, the paper proposes a framework for pedestrian detection in videos based on the YOLO object detection network [6] while having a high throughput of more than 5 FPS on the Jetson TX2 embedded board. The framework exploits deep learning for robust operation and uses a pre-trained model without the need for any additional training which makes it flexible to apply on different setups with minimum amount of tuning. The method achieves ~81 mAP when applied on a sample video from the Embedded Real-Time Inference (ERTI) Challenge where pedestrians are monitored by a UAV.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Drone Path-Following in GPS-Denied Environments using Convolutional Networks
Authors:
M. Samy,
K. Amer,
M. Shaker,
M. ElHelw
Abstract:
his paper presents a simple approach for drone navigation to follow a predetermined path using visual input only without reliance on a Global Positioning System (GPS). A Convolutional Neural Network (CNN) is used to output the steering command of the drone in an end-to-end approach. We tested our approach in two simulated environments in the Unreal Engine using the AirSim plugin for drone simulati…
▽ More
his paper presents a simple approach for drone navigation to follow a predetermined path using visual input only without reliance on a Global Positioning System (GPS). A Convolutional Neural Network (CNN) is used to output the steering command of the drone in an end-to-end approach. We tested our approach in two simulated environments in the Unreal Engine using the AirSim plugin for drone simulation. Results show that the proposed approach, despite its simplicity, has average cross track distance less than 2.9 meters in the simulated environment. We also investigate the significance of data augmentation in path following. Finally, we conclude by suggesting possible enhancements for extending our approach to more difficult paths in real life, in the hope that one day visual navigation will become the norm in GPS-denied zones.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.
-
Deep Convolutional Neural Network-Based Autonomous Drone Navigation
Authors:
K. Amer,
M. Samy,
M. Shaker,
M. ElHelw
Abstract:
This paper presents a novel approach for aerial drone autonomous navigation along predetermined paths using only visual input form an onboard camera and without reliance on a Global Positioning System (GPS). It is based on using a deep Convolutional Neural Network (CNN) combined with a regressor to output the drone steering commands. Furthermore, multiple auxiliary navigation paths that form a nav…
▽ More
This paper presents a novel approach for aerial drone autonomous navigation along predetermined paths using only visual input form an onboard camera and without reliance on a Global Positioning System (GPS). It is based on using a deep Convolutional Neural Network (CNN) combined with a regressor to output the drone steering commands. Furthermore, multiple auxiliary navigation paths that form a navigation envelope are used for data augmentation to make the system adaptable to real-life deployment scenarios. The approach is suitable for automating drone navigation in applications that exhibit regular trips or visits to same locations such as environmental and desertification monitoring, parcel/aid delivery and drone-based wireless internet delivery. In this case, the proposed algorithm replaces human operators, enhances accuracy of GPS-based map navigation, alleviates problems related to GPS-spoofing and enables navigation in GPS-denied environments. Our system is tested in two scenarios using the Unreal Engine-based AirSim plugin for drone simulation with promising results of average cross track distance less than 1.4 meters and mean waypoints minimum distance of less than 1 meter.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.