Search | arXiv e-print repository

TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data

Authors: Naor Cohen, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth,… ▽ More Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth, signal, and ambient panoramic 2D images, new opportunities emerge for LiDAR based tasks. In this work, we propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds. Using the Florence 2 large model in a zero-shot setting, we perform image captioning and object detection. Our experiments demonstrate that Florence 2 generates more informative captions and achieves superior performance in object detection tasks compared to existing methods like CLIP. By combining advanced LiDAR sensor data with a large pre-trained model, our approach provides a robust and accurate solution for challenging detection scenarios, including real-time applications requiring high accuracy and robustness. △ Less

Submitted 21 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

arXiv:2501.06235 [pdf, other]

NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data

Authors: Nirit Alkalay, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: 4D panoptic LiDAR segmentation is essential for scene understanding in autonomous driving and robotics, combining semantic and instance segmentation with temporal consistency. Current methods, like 4D-PLS and 4D-STOP, use a tracking-by-detection methodology, employing deep learning networks to perform semantic and instance segmentation on each frame. To maintain temporal consistency, large-size in… ▽ More 4D panoptic LiDAR segmentation is essential for scene understanding in autonomous driving and robotics, combining semantic and instance segmentation with temporal consistency. Current methods, like 4D-PLS and 4D-STOP, use a tracking-by-detection methodology, employing deep learning networks to perform semantic and instance segmentation on each frame. To maintain temporal consistency, large-size instances detected in the current frame are compared and associated with instances within a temporal window that includes the current and preceding frames. However, their reliance on short-term instance detection, lack of motion estimation, and exclusion of small-sized instances lead to frequent identity switches and reduced tracking performance. We address these issues with the NextStop1 tracker, which integrates Kalman filter-based motion estimation, data association, and lifespan management, along with a tracklet state concept to improve prioritization. Evaluated using the LiDAR Segmentation and Tracking Quality (LSTQ) metric on the SemanticKITTI validation set, NextStop demonstrated enhanced tracking performance, particularly for small-sized objects like people and bicyclists, with fewer ID switches, earlier tracking initiation, and improved reliability in complex environments. The source code is available at https://github.com/AIROTAU/NextStop △ Less

Submitted 24 March, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

arXiv:2412.05594 [pdf, other]

Real-Time 3D Object Detection Using InnovizOne LiDAR and Low-Power Hailo-8 AI Accelerator

Authors: Itay Krispin-Avraham, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Object detection is a significant field in autonomous driving. Popular sensors for this task include cameras and LiDAR sensors. LiDAR sensors offer several advantages, such as insensitivity to light changes, like in a dark setting and the ability to provide 3D information in the form of point clouds, which include the ranges of objects. However, 3D detection methods, such as PointPillars, typicall… ▽ More Object detection is a significant field in autonomous driving. Popular sensors for this task include cameras and LiDAR sensors. LiDAR sensors offer several advantages, such as insensitivity to light changes, like in a dark setting and the ability to provide 3D information in the form of point clouds, which include the ranges of objects. However, 3D detection methods, such as PointPillars, typically require high-power hardware. Additionally, most common spinning LiDARs are sparse and may not achieve the desired quality of object detection in front of the car. In this paper, we present the feasibility of performing real-time 3D object detection of cars using 3D point clouds from a LiDAR sensor, processed and deployed on a low-power Hailo-8 AI accelerator. The LiDAR sensor used in this study is the InnovizOne sensor, which captures objects in higher quality compared to spinning LiDAR techniques, especially for distant objects. We successfully achieved real-time inference at a rate of approximately 5Hz with a high accuracy of 0.91% F1 score, with only -0.2% degradation compared to running the same model on an NVIDIA GeForce RTX 2080 Ti. This work demonstrates that effective real-time 3D object detection can be achieved on low-cost, low-power hardware, representing a significant step towards more accessible autonomous driving technologies. The source code and the pre-trained models are available at https://github.com/AIROTAU/ PointPillarsHailoInnoviz/tree/main △ Less

Submitted 7 December, 2024; originally announced December 2024.

arXiv:2309.15204 [pdf, other]

CLRmatchNet: Enhancing Curved Lane Detection with Deep Matching Process

Authors: Sapir Kontente, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Lane detection plays a crucial role in autonomous driving by providing vital data to ensure safe navigation. Modern algorithms rely on anchor-based detectors, which are then followed by a label-assignment process to categorize training detections as positive or negative instances based on learned geometric attributes. Accurate label assignment has great impact on the model performance, that is usu… ▽ More Lane detection plays a crucial role in autonomous driving by providing vital data to ensure safe navigation. Modern algorithms rely on anchor-based detectors, which are then followed by a label-assignment process to categorize training detections as positive or negative instances based on learned geometric attributes. Accurate label assignment has great impact on the model performance, that is usually relying on a pre-defined classical cost function evaluating GT-prediction alignment. However, classical label assignment methods face limitations due to their reliance on predefined cost functions derived from low-dimensional models, potentially impacting their optimality. Our research introduces MatchNet, a deep learning submodule-based approach aimed at improving the label assignment process. Integrated into a state-of-the-art lane detection network such as the Cross Layer Refinement Network for Lane Detection (CLRNet), MatchNet replaces the conventional label assignment process with a submodule network. The integrated model, CLRmatchNet, surpasses CLRNet, showing substantial improvements in scenarios involving curved lanes, with remarkable improvement across all backbones of +2.8% for ResNet34, +2.3% for ResNet101, and +2.96% for DLA34. In addition, it maintains or even improves comparable results in other sections. Our method boosts the confidence level in lane detection, allowing an increase in the confidence threshold. Our code is available at: https://github.com/sapirkontente/CLRmatchNet.git △ Less

Submitted 31 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2210.13570 [pdf, other]

Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations

Authors: Amit Galor, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. However, in the task of Multiple Object Tracking (MOT), leveraging the power of Transformers remains relatively unexplored. Among the pioneering efforts in this domain, TransCenter, a Transformer-based MOT architecture with de… ▽ More Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. However, in the task of Multiple Object Tracking (MOT), leveraging the power of Transformers remains relatively unexplored. Among the pioneering efforts in this domain, TransCenter, a Transformer-based MOT architecture with dense object queries, demonstrated exceptional tracking capabilities while maintaining reasonable runtime. Nonetheless, one critical aspect in MOT, track displacement estimation, presents room for enhancement to further reduce association errors. In response to this challenge, our paper introduces a novel improvement to TransCenter. We propose a post-processing mechanism grounded in the Track-by-Detection paradigm, aiming to refine the track displacement estimation. Our approach involves the integration of a carefully designed Kalman filter, which incorporates Transformer outputs into measurement error estimation, and the use of an embedding network for target re-identification. This combined strategy yields substantial improvement in the accuracy and robustness of the tracking process. We validate our contributions through comprehensive experiments on the MOTChallenge datasets MOT17 and MOT20, where our proposed approach outperforms other Transformer-based trackers. The code is publicly available at: https://github.com/amitgalor18/STC_Tracker △ Less

Submitted 21 December, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2206.14651 [pdf, other]

BoT-SORT: Robust Associations Multi-Pedestrian Tracking

Authors: Nir Aharon, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object. In this paper, we present a new robust state-of-the-art tracker, which can combine the advantages of motion and appearance information, along with camera-motion compensation, and a more accurate Kalman filter state vector. Our new trackers BoT-SORT, and B… ▽ More The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object. In this paper, we present a new robust state-of-the-art tracker, which can combine the advantages of motion and appearance information, along with camera-motion compensation, and a more accurate Kalman filter state vector. Our new trackers BoT-SORT, and BoT-SORT-ReID rank first in the datasets of MOTChallenge [29, 11] on both MOT17 and MOT20 test sets, in terms of all the main MOT metrics: MOTA, IDF1, and HOTA. For MOT17: 80.5 MOTA, 80.2 IDF1, and 65.0 HOTA are achieved. The source code and the pre-trained models are available at https://github.com/NirAharon/BOT-SORT △ Less

Submitted 7 July, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

Showing 1–6 of 6 results for author: Orfaig, R