Search | arXiv e-print repository

TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data

Authors: Naor Cohen, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth,… ▽ More Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth, signal, and ambient panoramic 2D images, new opportunities emerge for LiDAR based tasks. In this work, we propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds. Using the Florence 2 large model in a zero-shot setting, we perform image captioning and object detection. Our experiments demonstrate that Florence 2 generates more informative captions and achieves superior performance in object detection tasks compared to existing methods like CLIP. By combining advanced LiDAR sensor data with a large pre-trained model, our approach provides a robust and accurate solution for challenging detection scenarios, including real-time applications requiring high accuracy and robustness. △ Less

Submitted 21 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

arXiv:2501.06235 [pdf, other]

NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data

Authors: Nirit Alkalay, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: 4D panoptic LiDAR segmentation is essential for scene understanding in autonomous driving and robotics, combining semantic and instance segmentation with temporal consistency. Current methods, like 4D-PLS and 4D-STOP, use a tracking-by-detection methodology, employing deep learning networks to perform semantic and instance segmentation on each frame. To maintain temporal consistency, large-size in… ▽ More 4D panoptic LiDAR segmentation is essential for scene understanding in autonomous driving and robotics, combining semantic and instance segmentation with temporal consistency. Current methods, like 4D-PLS and 4D-STOP, use a tracking-by-detection methodology, employing deep learning networks to perform semantic and instance segmentation on each frame. To maintain temporal consistency, large-size instances detected in the current frame are compared and associated with instances within a temporal window that includes the current and preceding frames. However, their reliance on short-term instance detection, lack of motion estimation, and exclusion of small-sized instances lead to frequent identity switches and reduced tracking performance. We address these issues with the NextStop1 tracker, which integrates Kalman filter-based motion estimation, data association, and lifespan management, along with a tracklet state concept to improve prioritization. Evaluated using the LiDAR Segmentation and Tracking Quality (LSTQ) metric on the SemanticKITTI validation set, NextStop demonstrated enhanced tracking performance, particularly for small-sized objects like people and bicyclists, with fewer ID switches, earlier tracking initiation, and improved reliability in complex environments. The source code is available at https://github.com/AIROTAU/NextStop △ Less

Submitted 24 March, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

arXiv:2412.05594 [pdf, other]

Real-Time 3D Object Detection Using InnovizOne LiDAR and Low-Power Hailo-8 AI Accelerator

Authors: Itay Krispin-Avraham, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Object detection is a significant field in autonomous driving. Popular sensors for this task include cameras and LiDAR sensors. LiDAR sensors offer several advantages, such as insensitivity to light changes, like in a dark setting and the ability to provide 3D information in the form of point clouds, which include the ranges of objects. However, 3D detection methods, such as PointPillars, typicall… ▽ More Object detection is a significant field in autonomous driving. Popular sensors for this task include cameras and LiDAR sensors. LiDAR sensors offer several advantages, such as insensitivity to light changes, like in a dark setting and the ability to provide 3D information in the form of point clouds, which include the ranges of objects. However, 3D detection methods, such as PointPillars, typically require high-power hardware. Additionally, most common spinning LiDARs are sparse and may not achieve the desired quality of object detection in front of the car. In this paper, we present the feasibility of performing real-time 3D object detection of cars using 3D point clouds from a LiDAR sensor, processed and deployed on a low-power Hailo-8 AI accelerator. The LiDAR sensor used in this study is the InnovizOne sensor, which captures objects in higher quality compared to spinning LiDAR techniques, especially for distant objects. We successfully achieved real-time inference at a rate of approximately 5Hz with a high accuracy of 0.91% F1 score, with only -0.2% degradation compared to running the same model on an NVIDIA GeForce RTX 2080 Ti. This work demonstrates that effective real-time 3D object detection can be achieved on low-cost, low-power hardware, representing a significant step towards more accessible autonomous driving technologies. The source code and the pre-trained models are available at https://github.com/AIROTAU/ PointPillarsHailoInnoviz/tree/main △ Less

Submitted 7 December, 2024; originally announced December 2024.

arXiv:2309.15204 [pdf, other]

CLRmatchNet: Enhancing Curved Lane Detection with Deep Matching Process

Authors: Sapir Kontente, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Lane detection plays a crucial role in autonomous driving by providing vital data to ensure safe navigation. Modern algorithms rely on anchor-based detectors, which are then followed by a label-assignment process to categorize training detections as positive or negative instances based on learned geometric attributes. Accurate label assignment has great impact on the model performance, that is usu… ▽ More Lane detection plays a crucial role in autonomous driving by providing vital data to ensure safe navigation. Modern algorithms rely on anchor-based detectors, which are then followed by a label-assignment process to categorize training detections as positive or negative instances based on learned geometric attributes. Accurate label assignment has great impact on the model performance, that is usually relying on a pre-defined classical cost function evaluating GT-prediction alignment. However, classical label assignment methods face limitations due to their reliance on predefined cost functions derived from low-dimensional models, potentially impacting their optimality. Our research introduces MatchNet, a deep learning submodule-based approach aimed at improving the label assignment process. Integrated into a state-of-the-art lane detection network such as the Cross Layer Refinement Network for Lane Detection (CLRNet), MatchNet replaces the conventional label assignment process with a submodule network. The integrated model, CLRmatchNet, surpasses CLRNet, showing substantial improvements in scenarios involving curved lanes, with remarkable improvement across all backbones of +2.8% for ResNet34, +2.3% for ResNet101, and +2.96% for DLA34. In addition, it maintains or even improves comparable results in other sections. Our method boosts the confidence level in lane detection, allowing an increase in the confidence threshold. Our code is available at: https://github.com/sapirkontente/CLRmatchNet.git △ Less

Submitted 31 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2210.13570 [pdf, other]

Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations

Authors: Amit Galor, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. However, in the task of Multiple Object Tracking (MOT), leveraging the power of Transformers remains relatively unexplored. Among the pioneering efforts in this domain, TransCenter, a Transformer-based MOT architecture with de… ▽ More Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. However, in the task of Multiple Object Tracking (MOT), leveraging the power of Transformers remains relatively unexplored. Among the pioneering efforts in this domain, TransCenter, a Transformer-based MOT architecture with dense object queries, demonstrated exceptional tracking capabilities while maintaining reasonable runtime. Nonetheless, one critical aspect in MOT, track displacement estimation, presents room for enhancement to further reduce association errors. In response to this challenge, our paper introduces a novel improvement to TransCenter. We propose a post-processing mechanism grounded in the Track-by-Detection paradigm, aiming to refine the track displacement estimation. Our approach involves the integration of a carefully designed Kalman filter, which incorporates Transformer outputs into measurement error estimation, and the use of an embedding network for target re-identification. This combined strategy yields substantial improvement in the accuracy and robustness of the tracking process. We validate our contributions through comprehensive experiments on the MOTChallenge datasets MOT17 and MOT20, where our proposed approach outperforms other Transformer-based trackers. The code is publicly available at: https://github.com/amitgalor18/STC_Tracker △ Less

Submitted 21 December, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2206.14651 [pdf, other]

BoT-SORT: Robust Associations Multi-Pedestrian Tracking

Authors: Nir Aharon, Roy Orfaig, Ben-Zion Bobrovsky

Abstract: The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object. In this paper, we present a new robust state-of-the-art tracker, which can combine the advantages of motion and appearance information, along with camera-motion compensation, and a more accurate Kalman filter state vector. Our new trackers BoT-SORT, and B… ▽ More The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object. In this paper, we present a new robust state-of-the-art tracker, which can combine the advantages of motion and appearance information, along with camera-motion compensation, and a more accurate Kalman filter state vector. Our new trackers BoT-SORT, and BoT-SORT-ReID rank first in the datasets of MOTChallenge [29, 11] on both MOT17 and MOT20 test sets, in terms of all the main MOT metrics: MOTA, IDF1, and HOTA. For MOT17: 80.5 MOTA, 80.2 IDF1, and 65.0 HOTA are achieved. The source code and the pre-trained models are available at https://github.com/NirAharon/BOT-SORT △ Less

Submitted 7 July, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

arXiv:2009.11342 [pdf, other]

Insights on Evaluation of Camera Re-localization Using Relative Pose Regression

Authors: Amir Shalev, Omer Achrack, Brian Fulkerson, Ben-Zion Bobrovsky

Abstract: We consider the problem of relative pose regression in visual relocalization. Recently, several promising approaches have emerged in this area. We claim that even though they demonstrate on the same datasets using the same split to train and test, a faithful comparison between them was not available since on currently used evaluation metric, some approaches might perform favorably, while in realit… ▽ More We consider the problem of relative pose regression in visual relocalization. Recently, several promising approaches have emerged in this area. We claim that even though they demonstrate on the same datasets using the same split to train and test, a faithful comparison between them was not available since on currently used evaluation metric, some approaches might perform favorably, while in reality performing worse. We reveal a tradeoff between accuracy and the 3D volume of the regressed subspace. We believe that unlike other relocalization approaches, in the case of relative pose regression, the regressed subspace 3D volume is less dependent on the scene and more affect by the method used to score the overlap, which determined how closely sampled viewpoints are. We propose three new metrics to remedy the issue mentioned above. The proposed metrics incorporate statistics about the regression subspace volume. We also propose a new pose regression network that serves as a new baseline for this task. We compare the performance of our trained model on Microsoft 7-Scenes and Cambridge Landmarks datasets both with the standard metrics and the newly proposed metrics and adjust the overlap score to reveal the tradeoff between the subspace and performance. The results show that the proposed metrics are more robust to different overlap threshold than the conventional approaches. Finally, we show that our network generalizes well, specifically, training on a single scene leads to little loss of performance on the other scenes. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: Accepted at ECCV 2020 joint workshop of UAVision and VisDrone

arXiv:2003.08362 [pdf]

Neural Network Tracking of Moving Objects with Unknown Equations of Motion

Authors: Boaz Fish, Ben Zion Bobrovsky

Abstract: In this paper we present a Neural Network design that can be used to track the location of a moving object within a given range based on the object's noisy coordinates measurement. A function commonly performed by the KLMn filter, our goal is to show that our method outperforms the Kalman filter in certain scenarios. In this paper we present a Neural Network design that can be used to track the location of a moving object within a given range based on the object's noisy coordinates measurement. A function commonly performed by the KLMn filter, our goal is to show that our method outperforms the Kalman filter in certain scenarios. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:0802.0414 [pdf, ps, other]

The exit problem in optimal non-causal extimation

Authors: Doron Ezri, Ben-Tzion Bobrovsky, Zeev Schuss

Abstract: We study the phenomenon of loss of lock in the optimal non-causal phase estimation problem, a benchmark problem in nonlinear estimation. Our method is based on the computation of the asymptotic distribution of the optimal estimation error in case the number of trajectories in the optimization problem is finite. The computation is based directly on the minimum noise energy optimality criterion ra… ▽ More We study the phenomenon of loss of lock in the optimal non-causal phase estimation problem, a benchmark problem in nonlinear estimation. Our method is based on the computation of the asymptotic distribution of the optimal estimation error in case the number of trajectories in the optimization problem is finite. The computation is based directly on the minimum noise energy optimality criterion rather than on state equations of the error, as is the usual case in the literature. The results include an asymptotic computation of the mean time to lose lock (MTLL) in the optimal smoother. We show that the MTLL in the first and second order smoothers is significantly longer than that in the causal extended Kalman filter. △ Less

Submitted 4 February, 2008; originally announced February 2008.

Comments: Loss of lock in nonlinear smoothers

MSC Class: 60G35; 62M09; 93E10

arXiv:0802.0130 [pdf, ps, other]

About the true type of smoothers

Authors: D. Ezri, B. Z. Bobrovsky, Z. Schuss

Abstract: We employ the variational formulation and the Euler-Lagrange equations to study the steady-state error in linear non-causal estimators (smoothers). We give a complete description of the steady-state error for inputs that are polynomial in time. We show that the steady-state error regime in a smoother is similar to that in a filter of double the type. This means that the steady-state error in the… ▽ More We employ the variational formulation and the Euler-Lagrange equations to study the steady-state error in linear non-causal estimators (smoothers). We give a complete description of the steady-state error for inputs that are polynomial in time. We show that the steady-state error regime in a smoother is similar to that in a filter of double the type. This means that the steady-state error in the optimal smoother is significantly smaller than that in the Kalman filter. The results reveal a significant advantage of smoothing over filtering with respect to robustness to model uncertainty. △ Less

Submitted 1 February, 2008; originally announced February 2008.

Comments: Non-causal estimation

MSC Class: 60G35; 93E10; 94A05

Showing 1–10 of 10 results for author: Bobrovsky, B