Search | arXiv e-print repository

Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection

Abstract: Perception is a key element for enabling intelligent autonomous navigation. Understanding the semantics of the surrounding environment and accurate vehicle pose estimation are essential capabilities for autonomous vehicles, including self-driving cars and mobile robots that perform complex tasks. Fast moving platforms like self-driving cars impose a hard challenge for localization and mapping algo… ▽ More Perception is a key element for enabling intelligent autonomous navigation. Understanding the semantics of the surrounding environment and accurate vehicle pose estimation are essential capabilities for autonomous vehicles, including self-driving cars and mobile robots that perform complex tasks. Fast moving platforms like self-driving cars impose a hard challenge for localization and mapping algorithms. In this work, we propose a novel framework for real-time LiDAR odometry and mapping based on LOAM architecture for fast moving platforms. Our framework utilizes semantic information produced by a deep learning model to improve point-to-line and point-to-plane matching between LiDAR scans and build a semantic map of the environment, leading to more accurate motion estimation using LiDAR data. We observe that including semantic information in the matching process introduces a new type of outlier matches to the process, where matching occur between different objects of the same semantic class. To this end, we propose a novel algorithm that explicitly identifies and discards potential outliers in the matching process. In our experiments, we study the effect of improving the matching process on the robustness of LiDAR odometry against high speed motion. Our experimental evaluations on KITTI dataset demonstrate that utilizing semantic information and rejecting outliers significantly enhance the robustness of LiDAR odometry and mapping when there are large gaps between scan acquisition poses, which is typical for fast moving platforms. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2309.05818 [pdf, other]

doi 10.1007/978-3-031-21595-7

Rice Plant Disease Detection and Diagnosis using Deep Convolutional Neural Networks and Multispectral Imaging

Authors: Yara Ali Alnaggar, Ahmad Sebaq, Karim Amer, ElSayed Naeem, Mohamed Elhelw

Abstract: Rice is considered a strategic crop in Egypt as it is regularly consumed in the Egyptian people's diet. Even though Egypt is the highest rice producer in Africa with a share of 6 million tons per year, it still imports rice to satisfy its local needs due to production loss, especially due to rice disease. Rice blast disease is responsible for 30% loss in rice production worldwide. Therefore, it is… ▽ More Rice is considered a strategic crop in Egypt as it is regularly consumed in the Egyptian people's diet. Even though Egypt is the highest rice producer in Africa with a share of 6 million tons per year, it still imports rice to satisfy its local needs due to production loss, especially due to rice disease. Rice blast disease is responsible for 30% loss in rice production worldwide. Therefore, it is crucial to target limiting yield damage by detecting rice crops diseases in its early stages. This paper introduces a public multispectral and RGB images dataset and a deep learning pipeline for rice plant disease detection using multi-modal data. The collected multispectral images consist of Red, Green and Near-Infrared channels and we show that using multispectral along with RGB channels as input archives a higher F1 accuracy compared to using RGB input only. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.02455 [pdf, other]

doi 10.1007/s00521-024-10363-3

RSDiff: Remote Sensing Image Generation from Text Using Diffusion Model

Authors: Ahmad Sebaq, Mohamed ElHelw

Abstract: The generation and enhancement of satellite imagery are critical in remote sensing, requiring high-quality, detailed images for accurate analysis. This research introduces a two-stage diffusion model methodology for synthesizing high-resolution satellite images from textual prompts. The pipeline comprises a Low-Resolution Diffusion Model (LRDM) that generates initial images based on text inputs an… ▽ More The generation and enhancement of satellite imagery are critical in remote sensing, requiring high-quality, detailed images for accurate analysis. This research introduces a two-stage diffusion model methodology for synthesizing high-resolution satellite images from textual prompts. The pipeline comprises a Low-Resolution Diffusion Model (LRDM) that generates initial images based on text inputs and a Super-Resolution Diffusion Model (SRDM) that refines these images into high-resolution outputs. The LRDM merges text and image embeddings within a shared latent space, capturing essential scene content and structure. The SRDM then enhances these images, focusing on spatial features and visual clarity. Experiments conducted using the Remote Sensing Image Captioning Dataset (RSICD) demonstrate that our method outperforms existing models, producing satellite images with accurate geographical details and improved spatial resolution. △ Less

Submitted 5 October, 2024; v1 submitted 3 September, 2023; originally announced September 2023.

Journal ref: Neural Comput & Applic (2024)

arXiv:2307.03374 [pdf, other]

doi 10.1088/2632-2153/ad4e04

STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map

Authors: Ammar Sherif, Abubakar Abid, Mustafa Elattar, Mohamed ElHelw

Abstract: Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference b… ▽ More Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference between tasks. That is why existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on a re-proposed data-driven features, Data Maps, which capture the training dynamics for each classification task during the MTL training. Through a theoretical comparison with other techniques, we manage to show that our approach has the superior scalability. Our experiments show a better performance and verify the method's effectiveness, even on an unprecedented number of tasks (up to 100 tasks on CIFAR100). Being the first to work on such number of tasks, our comparisons on the resulting grouping shows similar grouping to the mentioned in the dataset, CIFAR100. Finally, we provide a modular implementation for easier integration and testing, with examples from multiple datasets and tasks. △ Less

Submitted 26 May, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted to DMLR workshop @ ICML 23

arXiv:2012.07072 [pdf]

Robust Real-Time Pedestrian Detection on Embedded Devices

Authors: Mohamed Afifi, Yara Ali, Karim Amer, Mahmoud Shaker, Mohamed Elhelw

Abstract: Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However, the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances as well as the need for lightweight algorithms suitable for embedded s… ▽ More Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However, the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances as well as the need for lightweight algorithms suitable for embedded systems. This paper proposes a robust framework for pedestrian detection in many footages. The framework performs fine and coarse detections on different image regions and exploits temporal and spatial characteristics to attain enhanced accuracy and real time performance on embedded boards. The framework uses the Yolo-v3 object detection [1] as its backbone detector and runs on the Nvidia Jetson TX2 embedded board, however other detectors and/or boards can be used as well. The performance of the framework is demonstrated on two established datasets and its achievement of the second place in CVPR 2019 Embedded Real-Time Inference (ERTI) Challenge. △ Less

Submitted 13 December, 2020; originally announced December 2020.

arXiv:2011.01974 [pdf, other]

Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds

Authors: Yara Ali Alnaggar, Mohamed Afifi, Karim Amer, Mohamed Elhelw

Abstract: Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of poin… ▽ More Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of point clouds as input for 2D fully convolutional neural networks to balance the accuracy-speed trade-off. This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud to mitigate the loss of information inherent in single projection methods. Our Multi-Projection Fusion (MPF) framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models then combines the segmentation results of both views. The proposed framework is validated on the SemanticKITTI dataset where it achieved a mIoU of 55.5 which is higher than state-of-the-art projection-based methods RangeNet++ and PolarNet while being 1.6x faster than the former and 3.1x faster than the latter. △ Less

Submitted 6 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: Accepted at the 2021 Winter Conference on Applications of Computer Vision (WACV 2021)

arXiv:2005.03011 [pdf]

Overview of Surgical Simulation

Authors: Mohamed A. ElHelw

Abstract: Motivated by the current demand of clinical governance, surgical simulation is now a well-established modality for basic skills training and assessment. The practical deployment of the technique is a multi-disciplinary venture encompassing areas in engineering, medicine and psychology. This paper provides an overview of the key topics involved in surgical simulation and associated technical challe… ▽ More Motivated by the current demand of clinical governance, surgical simulation is now a well-established modality for basic skills training and assessment. The practical deployment of the technique is a multi-disciplinary venture encompassing areas in engineering, medicine and psychology. This paper provides an overview of the key topics involved in surgical simulation and associated technical challenges. The paper discusses the clinical motivation for surgical simulation, the use of virtual environments for surgical training, model acquisition and simplification, deformable models, collision detection, tissue property measurement, haptic rendering and image synthesis. Additional topics include surgical skill training and assessment metrics as well as challenges facing the incorporation of surgical simulation into medical education curricula. △ Less

Submitted 6 May, 2020; originally announced May 2020.

ACM Class: I.6.3; I.6.8

arXiv:1905.06653 [pdf]

Robust Real-time Pedestrian Detection in Aerial Imagery on Jetson TX2

Authors: Mohamed Afifi, Yara Ali, Karim Amer, Mahmoud Shaker, Mohamed ElHelw

Abstract: Detection of pedestrians in aerial imagery captured by drones has many applications including intersection monitoring, patrolling, and surveillance, to name a few. However, the problem is involved due to continuouslychanging camera viewpoint and object appearance as well as the need for lightweight algorithms to run on on-board embedded systems. To address this issue, the paper proposes a framewor… ▽ More Detection of pedestrians in aerial imagery captured by drones has many applications including intersection monitoring, patrolling, and surveillance, to name a few. However, the problem is involved due to continuouslychanging camera viewpoint and object appearance as well as the need for lightweight algorithms to run on on-board embedded systems. To address this issue, the paper proposes a framework for pedestrian detection in videos based on the YOLO object detection network [6] while having a high throughput of more than 5 FPS on the Jetson TX2 embedded board. The framework exploits deep learning for robust operation and uses a pre-trained model without the need for any additional training which makes it flexible to apply on different setups with minimum amount of tuning. The method achieves ~81 mAP when applied on a sample video from the Embedded Real-Time Inference (ERTI) Challenge where pedestrians are monitored by a UAV. △ Less

Submitted 16 May, 2019; originally announced May 2019.

arXiv:1905.01658 [pdf]

Drone Path-Following in GPS-Denied Environments using Convolutional Networks

Authors: M. Samy, K. Amer, M. Shaker, M. ElHelw

Abstract: his paper presents a simple approach for drone navigation to follow a predetermined path using visual input only without reliance on a Global Positioning System (GPS). A Convolutional Neural Network (CNN) is used to output the steering command of the drone in an end-to-end approach. We tested our approach in two simulated environments in the Unreal Engine using the AirSim plugin for drone simulati… ▽ More his paper presents a simple approach for drone navigation to follow a predetermined path using visual input only without reliance on a Global Positioning System (GPS). A Convolutional Neural Network (CNN) is used to output the steering command of the drone in an end-to-end approach. We tested our approach in two simulated environments in the Unreal Engine using the AirSim plugin for drone simulation. Results show that the proposed approach, despite its simplicity, has average cross track distance less than 2.9 meters in the simulated environment. We also investigate the significance of data augmentation in path following. Finally, we conclude by suggesting possible enhancements for extending our approach to more difficult paths in real life, in the hope that one day visual navigation will become the norm in GPS-denied zones. △ Less

Submitted 5 May, 2019; originally announced May 2019.

arXiv:1905.01657 [pdf]

Deep Convolutional Neural Network-Based Autonomous Drone Navigation

Authors: K. Amer, M. Samy, M. Shaker, M. ElHelw

Abstract: This paper presents a novel approach for aerial drone autonomous navigation along predetermined paths using only visual input form an onboard camera and without reliance on a Global Positioning System (GPS). It is based on using a deep Convolutional Neural Network (CNN) combined with a regressor to output the drone steering commands. Furthermore, multiple auxiliary navigation paths that form a nav… ▽ More This paper presents a novel approach for aerial drone autonomous navigation along predetermined paths using only visual input form an onboard camera and without reliance on a Global Positioning System (GPS). It is based on using a deep Convolutional Neural Network (CNN) combined with a regressor to output the drone steering commands. Furthermore, multiple auxiliary navigation paths that form a navigation envelope are used for data augmentation to make the system adaptable to real-life deployment scenarios. The approach is suitable for automating drone navigation in applications that exhibit regular trips or visits to same locations such as environmental and desertification monitoring, parcel/aid delivery and drone-based wireless internet delivery. In this case, the proposed algorithm replaces human operators, enhances accuracy of GPS-based map navigation, alleviates problems related to GPS-spoofing and enables navigation in GPS-denied environments. Our system is tested in two scenarios using the Unreal Engine-based AirSim plugin for drone simulation with promising results of average cross track distance less than 1.4 meters and mean waypoints minimum distance of less than 1 meter. △ Less

Submitted 5 May, 2019; originally announced May 2019.

Showing 1–10 of 10 results for author: ElHelw, M