Skip to main content

Showing 1–21 of 21 results for author: de Geus, D

.
  1. arXiv:2506.06928  [pdf, ps, other

    cs.CV

    How Important are Videos for Training Video LLMs?

    Authors: George Lydakis, Alexander Hermans, Ali Athar, Daan de Geus, Bastian Leibe

    Abstract: Research into Video Large Language Models (LLMs) has progressed rapidly, with numerous models and benchmarks emerging in just a few years. Typically, these models are initialized with a pretrained text-only LLM and finetuned on both image- and video-caption datasets. In this paper, we present findings indicating that Video LLMs are more capable of temporal reasoning after image-only training than… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Project page on https://visualcomputinginstitute.github.io/videollm-pseudovideo-training/

  2. arXiv:2506.06854  [pdf, ps, other

    cs.CV

    DONUT: A Decoder-Only Model for Trajectory Prediction

    Authors: Markus Knoche, Daan de Geus, Bastian Leibe

    Abstract: Predicting the motion of other agents in a scene is highly relevant for autonomous driving, as it allows a self-driving car to anticipate. Inspired by the success of decoder-only models for language modeling, we propose DONUT, a Decoder-Only Network for Unrolling Trajectories. Different from existing encoder-decoder forecasting models, we encode historical trajectories and predict future trajector… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  3. arXiv:2503.19108  [pdf, other

    cs.CV

    Your ViT is Secretly an Image Segmentation Model

    Authors: Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, Daan de Geus

    Abstract: Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks. To apply single-scale ViTs to image segmentation, existing methods adopt a convolutional adapter to generate multi-scale features, a pixel decoder to fuse these features, and a Transformer decoder that uses the fused features to make predictions. In this paper, we show that the induct… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Code: https://www.tue-mps.org/eomt/

  4. arXiv:2503.18944  [pdf, other

    cs.CV

    DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation

    Authors: Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, Bastian Leibe

    Abstract: Vision foundation models (VFMs) trained on large-scale image datasets provide high-quality features that have significantly advanced 2D visual recognition. However, their potential in 3D vision remains largely untapped, despite the common availability of 2D images alongside 3D point cloud datasets. While significant research has been dedicated to 2D-3D fusion, recent state-of-the-art 3D methods pr… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Project page at https://vision.rwth-aachen.de/DITR

  5. arXiv:2409.17208  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation

    Authors: Tommie Kerssies, Daan de Geus, Gijs Dubbelman

    Abstract: In this report, we present the first place solution to the ECCV 2024 BRAVO Challenge, where a model is trained on Cityscapes and its robustness is evaluated on several out-of-distribution datasets. Our solution leverages the powerful representations learned by vision foundation models, by attaching a simple segmentation decoder to DINOv2 and fine-tuning the entire model. This approach outperforms… ▽ More

    Submitted 8 October, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: v2 fixes ECE and FPR@95, among other small changes. arXiv admin note: substantial text overlap with arXiv:2409.15107

  6. arXiv:2409.15107  [pdf, other

    cs.CV cs.AI cs.LG

    The BRAVO Semantic Segmentation Challenge Results in UNCV2024

    Authors: Tuan-Hung Vu, Eduardo Valle, Andrei Bursuc, Tommie Kerssies, Daan de Geus, Gijs Dubbelman, Long Qian, Bingke Zhu, Yingying Chen, Ming Tang, Jinqiao Wang, Tomáš Vojíř, Jan Šochman, Jiří Matas, Michael Smith, Frank Ferrie, Shamik Basu, Christos Sakaridis, Luc Van Gool

    Abstract: We propose the unified BRAVO challenge to benchmark the reliability of semantic segmentation models under realistic perturbations and unknown out-of-distribution (OOD) scenarios. We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to… ▽ More

    Submitted 9 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: ECCV 2024 proceeding paper of the BRAVO challenge 2024, see https://benchmarks.elsa-ai.eu/?ch=1&com=introduction Corrected numbers in Tables 1,3,4,5 and 10

  7. arXiv:2409.11355  [pdf, other

    cs.CV

    Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

    Authors: Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe

    Abstract: Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results, high computational demands due to multi-step inference limited its use in many scenarios. In this paper, we show that the perceived inefficiency was caused by… ▽ More

    Submitted 19 March, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: WACV 2025 Oral. Project page at https://vision.rwth-aachen.de/diffusion-e2e-ft

  8. arXiv:2406.10114  [pdf, other

    cs.CV

    Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

    Authors: Daan de Geus, Gijs Dubbelman

    Abstract: Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object. Existing methods approach PPS by separately conducting object-level and part-level segmentation. However, their part-level predictions are not link… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. Project page and code: https://tue-mps.github.io/tapps/

  9. arXiv:2406.09936  [pdf, other

    cs.CV

    ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

    Authors: Narges Norouzi, Svetlana Orlova, Daan de Geus, Gijs Dubbelman

    Abstract: This work presents Adaptive Local-then-Global Merging (ALGM), a token reduction method for semantic segmentation networks that use plain Vision Transformers. ALGM merges tokens in two stages: (1) In the first network layer, it merges similar tokens within a small local window and (2) halfway through the network, it merges similar tokens across the entire image. This is motivated by an analysis in… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. Project page and code: https://tue-mps.github.io/ALGM

  10. arXiv:2406.09896  [pdf, other

    cs.CV

    Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation

    Authors: Brunó B. Englert, Fabrizio J. Piva, Tommie Kerssies, Daan de Geus, Gijs Dubbelman

    Abstract: Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of Vision Foundation Models (VFMs) an… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Workshop Proceedings for the Second Workshop on Foundation Models

  11. arXiv:2404.12172  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    How to Benchmark Vision Foundation Models for Semantic Segmentation?

    Authors: Tommie Kerssies, Daan de Geus, Gijs Dubbelman

    Abstract: Recent vision foundation models (VFMs) have demonstrated proficiency in various tasks but require supervised fine-tuning to perform the task of semantic segmentation effectively. Benchmarking their performance is essential for selecting current models and guiding future model developments for this task. The lack of a standardized benchmark complicates comparisons. Therefore, the primary objective… ▽ More

    Submitted 10 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop Proceedings for the Second Workshop on Foundation Models. v2 updates image normalization preprocessing for linear probing with EVA-02, EVA-02-CLIP, SigLIP, DFN (the impact on end-to-end fine-tuning is negligible; no changes made)

  12. arXiv:2306.02095  [pdf, other

    cs.CV

    Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers

    Authors: Chenyang Lu, Daan de Geus, Gijs Dubbelman

    Abstract: This paper introduces Content-aware Token Sharing (CTS), a token reduction approach that improves the computational efficiency of semantic segmentation networks that use Vision Transformers (ViTs). Existing works have proposed token reduction approaches to improve the efficiency of ViT-based image classification networks, but these methods are not directly applicable to semantic segmentation, whic… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: CVPR 2023. Project page and code: https://tue-mps.github.io/CTS/

  13. Intra-Batch Supervision for Panoptic Segmentation on High-Resolution Images

    Authors: Daan de Geus, Gijs Dubbelman

    Abstract: Unified panoptic segmentation methods are achieving state-of-the-art results on several datasets. To achieve these results on high-resolution datasets, these methods apply crop-based training. In this work, we find that, although crop-based training is advantageous in general, it also has a harmful side-effect. Specifically, it limits the ability of unified networks to discriminate between large o… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: WACV 2023. Project page and code: https://ddegeus.github.io/intra-batch-supervision/

  14. arXiv:2304.01447  [pdf, other

    cs.MA cs.AI cs.LG

    Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

    Authors: Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman

    Abstract: Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a reasoning paradigm where agents anticipate the learning steps of other agents to improve cooperation among themselves. As MARL uses gradient-based optimization, learning anticipation requires using Higher-Order Gradients (HOG), with so-called HOG methods. Existing HOG methods are based on policy parameter anticipation, i.e., a… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  15. arXiv:2303.08307  [pdf, other

    cs.MA

    Coordinating Fully-Cooperative Agents Using Hierarchical Learning Anticipation

    Authors: Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman

    Abstract: Learning anticipation is a reasoning paradigm in multi-agent reinforcement learning, where agents, during learning, consider the anticipated learning of other agents. There has been substantial research into the role of learning anticipation in improving cooperation among self-interested agents in general-sum games. Two primary examples are Learning with Opponent-Learning Awareness (LOLA), which a… ▽ More

    Submitted 2 April, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: AAMAS 2023 Workshop on Optimization and Learning in Multi-Agent Systems

  16. Proactive Risk Navigation System for Real-World Urban Intersections

    Authors: Tim Puphal, Benedict Flade, Daan de Geus, Julian Eggert

    Abstract: We consider the problem of intelligently navigating through complex traffic. Urban situations are defined by the underlying map structure and special regulatory objects of e.g. a stop line or crosswalk. Thereon dynamic vehicles (cars, bicycles, etc.) move forward, while trying to keep accident risks low. Especially at intersections, the combination and interaction of traffic elements is diverse… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Journal ref: International Conference on Intelligent Transportation Systems (ITSC 2020)

  17. arXiv:2106.06351  [pdf, other

    cs.CV

    Part-aware Panoptic Segmentation

    Authors: Daan de Geus, Panagiotis Meletis, Chenyang Lu, Xiaoxiao Wen, Gijs Dubbelman

    Abstract: In this work, we introduce the new scene understanding task of Part-aware Panoptic Segmentation (PPS), which aims to understand a scene at multiple levels of abstraction, and unifies the tasks of scene parsing and part parsing. For this novel task, we provide consistent annotations on two commonly used datasets: Cityscapes and Pascal VOC. Moreover, we present a single metric to evaluate PPS, calle… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: CVPR 2021. Code and data: https://github.com/tue-mps/panoptic_parts

  18. arXiv:2004.07944  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding

    Authors: Panagiotis Meletis, Xiaoxiao Wen, Chenyang Lu, Daan de Geus, Gijs Dubbelman

    Abstract: In this technical report, we present two novel datasets for image scene understanding. Both datasets have annotations compatible with panoptic segmentation and additionally they have part-level labels for selected semantic classes. This report describes the format of the two datasets, the annotation protocols, the merging strategies, and presents the datasets statistics. The datasets labels togeth… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

  19. arXiv:1910.03892  [pdf, other

    cs.CV

    Fast Panoptic Segmentation Network

    Authors: Daan de Geus, Panagiotis Meletis, Gijs Dubbelman

    Abstract: In this work, we present an end-to-end network for fast panoptic segmentation. This network, called Fast Panoptic Segmentation Network (FPSNet), does not require computationally costly instance mask predictions or merging heuristics. This is achieved by casting the panoptic task into a custom dense pixel-wise classification task, which assigns a class label or an instance id to each pixel. We eval… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  20. arXiv:1902.02678  [pdf, other

    cs.CV

    Single Network Panoptic Segmentation for Street Scene Understanding

    Authors: Daan de Geus, Panagiotis Meletis, Gijs Dubbelman

    Abstract: In this work, we propose a single deep neural network for panoptic segmentation, for which the goal is to provide each individual pixel of an input image with a class label, as in semantic segmentation, as well as a unique identifier for specific objects in an image, following instance segmentation. Our network makes joint semantic and instance segmentation predictions and combines these to form a… ▽ More

    Submitted 7 February, 2019; originally announced February 2019.

  21. arXiv:1809.02110  [pdf, other

    cs.CV

    Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network

    Authors: Daan de Geus, Panagiotis Meletis, Gijs Dubbelman

    Abstract: We present a single network method for panoptic segmentation. This method combines the predictions from a jointly trained semantic and instance segmentation network using heuristics. Joint training is the first step towards an end-to-end panoptic segmentation network and is faster and more memory efficient than training and predicting with two networks, as done in previous work. The architecture c… ▽ More

    Submitted 7 February, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: Technical report