-
PanSt3R: Multi-view Consistent Panoptic Segmentation
Authors:
Lojze Zust,
Yohann Cabon,
Juliette Marrie,
Leonid Antsfeld,
Boris Chidlovskii,
Jerome Revaud,
Gabriela Csurka
Abstract:
Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on…
▽ More
Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on NeRF) to integrate and fuse the 2D predictions. We argue that relying on 2D panoptic segmentation for a problem inherently 3D and multi-view is likely suboptimal as it fails to leverage the full potential of spatial relationships across views. In addition to requiring camera parameters, these approaches also necessitate computationally expensive test-time optimization for each scene. Instead, in this work, we propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization by jointly predicting 3D geometry and multi-view panoptic segmentation in a single forward pass. Our approach builds upon recent advances in 3D reconstruction, specifically upon MUSt3R, a scalable multi-view version of DUSt3R, and enhances it with semantic awareness and multi-view panoptic segmentation capabilities. We additionally revisit the standard post-processing mask merging procedure and introduce a more principled approach for multi-view segmentation. We also introduce a simple method for generating novel-view predictions based on the predictions of PanSt3R and vanilla 3DGS. Overall, the proposed PanSt3R is conceptually simple, yet fast and scalable, and achieves state-of-the-art performance on several benchmarks, while being orders of magnitude faster than existing methods.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
3rd Workshop on Maritime Computer Vision (MaCVi) 2025: Challenge Results
Authors:
Benjamin Kiefer,
Lojze Žust,
Jon Muhovič,
Matej Kristan,
Janez Perš,
Matija Teršek,
Uma Mudenagudi Chaitra Desai,
Arnold Wiliem,
Marten Kreis,
Nikhil Akalwadi,
Yitong Quan,
Zhiqiang Zhong,
Zhe Zhang,
Sujie Liu,
Xuran Chen,
Yang Yang,
Matej Fabijanić,
Fausto Ferreira,
Seongju Lee,
Junseok Lee,
Kyoobin Lee,
Shanliang Yao,
Runwei Guan,
Xiaoyu Huang,
Yi Ni
, et al. (23 additional authors not shown)
Abstract:
The 3rd Workshop on Maritime Computer Vision (MaCVi) 2025 addresses maritime computer vision for Unmanned Surface Vehicles (USV) and underwater. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 700 submissions. All datasets, evaluation code, and the leaderboard are available to the pub…
▽ More
The 3rd Workshop on Maritime Computer Vision (MaCVi) 2025 addresses maritime computer vision for Unmanned Surface Vehicles (USV) and underwater. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 700 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi25.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation
Authors:
Lojze Žust,
Matej Kristan
Abstract:
Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current ap…
▽ More
Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current approaches: (i) the query proposal generation process is biased towards larger objects, resulting in missed smaller objects, (ii) initially well-localized queries may drift to other objects, resulting in missed detections, (iii) spatially well-separated instances may be merged into a single mask causing inconsistent and false scene interpretations. To address these issues, we rethink the individual components of the network and its supervision, and propose a novel method for panoptic segmentation PanSR. PanSR effectively mitigates instance merging, enhances small-object detection and increases performance in crowded scenes, delivering a notable +3.4 PQ improvement over state-of-the-art on the challenging LaRS benchmark, while reaching state-of-the-art performance on Cityscapes. The code and models will be publicly available at https://github.com/lojzezust/PanSR.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion
Authors:
Bardienus Duisterhof,
Lojze Zust,
Philippe Weinzaepfel,
Vincent Leroy,
Yohann Cabon,
Jerome Revaud
Abstract:
Structure-from-Motion (SfM), a task aiming at jointly recovering camera poses and 3D geometry of a scene given a set of images, remains a hard problem with still many open challenges despite decades of significant progress. The traditional solution for SfM consists of a complex pipeline of minimal solvers which tends to propagate errors and fails when images do not sufficiently overlap, have too l…
▽ More
Structure-from-Motion (SfM), a task aiming at jointly recovering camera poses and 3D geometry of a scene given a set of images, remains a hard problem with still many open challenges despite decades of significant progress. The traditional solution for SfM consists of a complex pipeline of minimal solvers which tends to propagate errors and fails when images do not sufficiently overlap, have too little motion, etc. Recent methods have attempted to revisit this paradigm, but we empirically show that they fall short of fixing these core issues. In this paper, we propose instead to build upon a recently released foundation model for 3D vision that can robustly produce local 3D reconstructions and accurate matches. We introduce a low-memory approach to accurately align these local reconstructions in a global coordinate system. We further show that such foundation models can serve as efficient image retrievers without any overhead, reducing the overall complexity from quadratic to linear. Overall, our novel SfM pipeline is simple, scalable, fast and truly unconstrained, i.e. it can handle any collection of images, ordered or not. Extensive experiments on multiple benchmarks show that our method provides steady performance across diverse settings, especially outperforming existing methods in small- and medium-scale settings.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024
Authors:
Benjamin Kiefer,
Lojze Žust,
Matej Kristan,
Janez Perš,
Matija Teršek,
Arnold Wiliem,
Martin Messmer,
Cheng-Yen Yang,
Hsiang-Wei Huang,
Zhongyu Jiang,
Heng-Cheng Kuo,
Jie Mei,
Jenq-Neng Hwang,
Daniel Stadler,
Lars Sommer,
Kaer Huang,
Aiguo Zheng,
Weitu Chong,
Kanokphan Lertniphonphan,
Jun Xie,
Feng Chen,
Jian Li,
Zhepeng Wang,
Luca Zedda,
Andrea Loddo
, et al. (24 additional authors not shown)
Abstract:
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst…
▽ More
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark
Authors:
Lojze Žust,
Janez Perš,
Matej Kristan
Abstract:
The progress in maritime obstacle detection is hindered by the lack of a diverse dataset that adequately captures the complexity of general maritime environments. We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas. Our major contribution is the new dataset, which boasts the largest diversity in recording locations, scene types, ob…
▽ More
The progress in maritime obstacle detection is hindered by the lack of a diverse dataset that adequately captures the complexity of general maritime environments. We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas. Our major contribution is the new dataset, which boasts the largest diversity in recording locations, scene types, obstacle classes, and acquisition conditions among the related datasets. LaRS is composed of over 4000 per-pixel labeled key frames with nine preceding frames to allow utilization of the temporal texture, amounting to over 40k frames. Each key frame is annotated with 8 thing, 3 stuff classes and 19 global scene attributes. We report the results of 27 semantic and panoptic segmentation methods, along with several performance insights and future research directions. To enable objective evaluation, we have implemented an online evaluation server. The LaRS dataset, evaluation toolkit and benchmark are publicly available at: https://lojzezust.github.io/lars-dataset
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
eWaSR -- an embedded-compute-ready maritime obstacle detection network
Authors:
Matija Teršek,
Lojze Žust,
Matej Kristan
Abstract:
Maritime obstacle detection is critical for safe navigation of autonomous surface vehicles (ASVs). While the accuracy of image-based detection methods has advanced substantially, their computational and memory requirements prohibit deployment on embedded devices. In this paper we analyze the currently best-performing maritime obstacle detection network WaSR. Based on the analysis we then propose r…
▽ More
Maritime obstacle detection is critical for safe navigation of autonomous surface vehicles (ASVs). While the accuracy of image-based detection methods has advanced substantially, their computational and memory requirements prohibit deployment on embedded devices. In this paper we analyze the currently best-performing maritime obstacle detection network WaSR. Based on the analysis we then propose replacements for the most computationally intensive stages and propose its embedded-compute-ready variant eWaSR. In particular, the new design follows the most recent advancements of transformer-based lightweight networks. eWaSR achieves comparable detection results to state-of-the-art WaSR with only 0.52% F1 score performance drop and outperforms other state-of-the-art embedded-ready architectures by over 9.74% in F1 score. On a standard GPU, eWaSR runs 10x faster than the original WaSR (115 FPS vs 11 FPS). Tests on a real embedded device OAK-D show that, while WaSR cannot run due to memory restrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the first practical embedded-compute-ready maritime obstacle detection network. The source code and trained eWaSR models are publicly available here: https://github.com/tersekmatija/eWaSR.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results
Authors:
Benjamin Kiefer,
Matej Kristan,
Janez Perš,
Lojze Žust,
Fabio Poiesi,
Fabio Augusto de Alcantara Andrade,
Alexandre Bernardino,
Matthew Dawkins,
Jenni Raitoharju,
Yitong Quan,
Adem Atmaca,
Timon Höfer,
Qiming Zhang,
Yufei Xu,
Jing Zhang,
Dacheng Tao,
Lars Sommer,
Raphael Spraul,
Hangyue Zhao,
Hongpu Zhang,
Yanyun Zhao,
Jan Lukas Augustin,
Eui-ik Jeon,
Impyeong Lee,
Luca Zedda
, et al. (48 additional authors not shown)
Abstract:
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detec…
▽ More
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
△ Less
Submitted 28 November, 2022; v1 submitted 24 November, 2022;
originally announced November 2022.
-
Learning with Weak Annotations for Robust Maritime Obstacle Detection
Authors:
Lojze Žust,
Matej Kristan
Abstract:
Robust maritime obstacle detection is critical for safe navigation of autonomous boats and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. However, per-pixel ground truth labeling of such datasets is labor-intensive and expensive. We propose a new scaffolding learning regime (SLR) that leverages weak annotations consisting…
▽ More
Robust maritime obstacle detection is critical for safe navigation of autonomous boats and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. However, per-pixel ground truth labeling of such datasets is labor-intensive and expensive. We propose a new scaffolding learning regime (SLR) that leverages weak annotations consisting of water edges, the horizon location, and obstacle bounding boxes to train segmentation-based obstacle detection networks, thereby reducing the required ground truth labeling effort by a factor of twenty. SLR trains an initial model from weak annotations and then alternates between re-estimating the segmentation pseudo-labels and improving the network parameters. Experiments show that maritime obstacle segmentation networks trained using SLR on weak annotations not only match but outperform the same networks trained with dense ground truth labels, which is a remarkable result. In addition to the increased accuracy, SLR also increases domain generalization and can be used for domain adaptation with a low manual annotation load. The SLR code and pre-trained models are available at https://github.com/lojzezust/SLR .
△ Less
Submitted 25 November, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Temporal Context for Robust Maritime Obstacle Detection
Authors:
Lojze Žust,
Matej Kristan
Abstract:
Robust maritime obstacle detection is essential for fully autonomous unmanned surface vehicles (USVs). The currently widely adopted segmentation-based obstacle detection methods are prone to misclassification of object reflections and sun glitter as obstacles, producing many false positive detections, effectively rendering the methods impractical for USV navigation. However, water-turbulence-induc…
▽ More
Robust maritime obstacle detection is essential for fully autonomous unmanned surface vehicles (USVs). The currently widely adopted segmentation-based obstacle detection methods are prone to misclassification of object reflections and sun glitter as obstacles, producing many false positive detections, effectively rendering the methods impractical for USV navigation. However, water-turbulence-induced temporal appearance changes on object reflections are very distinctive from the appearance dynamics of true objects. We harness this property to design WaSR-T, a novel maritime obstacle detection network, that extracts the temporal context from a sequence of recent frames to reduce ambiguity. By learning the local temporal characteristics of object reflection on the water surface, WaSR-T substantially improves obstacle detection accuracy in the presence of reflections and glitter. Compared with existing single-frame methods, WaSR-T reduces the number of false positive detections by 41% overall and by over 53% within the danger zone of the boat, while preserving a high recall, and achieving new state-of-the-art performance on the challenging MODS maritime obstacle detection benchmark. The code, pretrained models and extended datasets are available at https://github.com/lojzezust/WaSR-T
△ Less
Submitted 3 August, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Learning Maritime Obstacle Detection from Weak Annotations by Scaffolding
Authors:
Lojze Žust,
Matej Kristan
Abstract:
Coastal water autonomous boats rely on robust perception methods for obstacle detection and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. Per-pixel ground truth labeling of such datasets, however, is labor-intensive and expensive. We observe that far less information is required for practical obstacle avoidance - the loca…
▽ More
Coastal water autonomous boats rely on robust perception methods for obstacle detection and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. Per-pixel ground truth labeling of such datasets, however, is labor-intensive and expensive. We observe that far less information is required for practical obstacle avoidance - the location of water edge on static obstacles like shore and approximate location and bounds of dynamic obstacles in the water is sufficient to plan a reaction. We propose a new scaffolding learning regime (SLR) that allows training obstacle detection segmentation networks only from such weak annotations, thus significantly reducing the cost of ground-truth labeling. Experiments show that maritime obstacle segmentation networks trained using SLR substantially outperform the same networks trained with dense ground truth labels. Thus accuracy is not sacrificed for labelling simplicity but is in fact improved, which is a remarkable result.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.