Skip to main content

Showing 1–50 of 150 results for author: Tuytelaars, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.11024  [pdf, ps, other

    cs.LG cs.AI cs.DC

    Not All Clients Are Equal: Personalized Federated Learning on Heterogeneous Multi-Modal Clients

    Authors: Minhyuk Seo, Taeheon Kim, Hankook Lee, Jonghyun Choi, Tinne Tuytelaars

    Abstract: Foundation models have shown remarkable capabilities across diverse multi-modal tasks, but their centralized training raises privacy concerns and induces high transmission costs. In contrast, federated learning (FL) offers a distributed alternative without the need to share data. Recently, for the growing demand for personalizing AI models for different user purposes, personalized federated learni… ▽ More

    Submitted 20 May, 2025; originally announced June 2025.

  2. arXiv:2506.02011  [pdf, ps, other

    cs.CV

    OASIS: Online Sample Selection for Continual Visual Instruction Tuning

    Authors: Minjae Lee, Minhyuk Seo, Tingyu Qu, Tinne Tuytelaars, Jonghyun Choi

    Abstract: In continual visual instruction tuning (CVIT) scenarios, where multi-modal data continuously arrive in an online streaming manner, training delays from large-scale data significantly hinder real-time adaptation. While existing data selection strategies reduce training overheads, they rely on pre-trained reference models, which are impractical in CVIT setups due to unknown future data. Recent refer… ▽ More

    Submitted 27 May, 2025; originally announced June 2025.

  3. arXiv:2504.18954  [pdf, other

    eess.IV cs.AI cs.CV

    Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities

    Authors: Marco Mezzina, Pieter De Backer, Tom Vercauteren, Matthew Blaschko, Alexandre Mottrie, Tinne Tuytelaars

    Abstract: Purpose: Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events, functioning as a building block for efficient video review, surgical education as well as skill assessment. Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better cla… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  4. arXiv:2504.18468  [pdf, other

    cs.CV

    RGS-DR: Reflective Gaussian Surfels with Deferred Rendering for Shiny Objects

    Authors: Georgios Kouros, Minye Wu, Tinne Tuytelaars

    Abstract: We introduce RGS-DR, a novel inverse rendering method for reconstructing and rendering glossy and reflective objects with support for flexible relighting and scene editing. Unlike existing methods (e.g., NeRF and 3D Gaussian Splatting), which struggle with view-dependent effects, RGS-DR utilizes a 2D Gaussian surfel representation to accurately estimate geometry and surface normals, an essential p… ▽ More

    Submitted 5 May, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

  5. arXiv:2504.02522  [pdf, other

    cs.CV

    Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment

    Authors: Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans

    Abstract: The capacity of Vision transformers (ViTs) to handle variable-sized inputs is often constrained by computational complexity and batch processing limitations. Consequently, ViTs are typically trained on small, fixed-size images obtained through downscaling or cropping. While reducing computational burden, these methods result in significant information loss, negatively affecting tasks like image ae… ▽ More

    Submitted 15 May, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  6. arXiv:2503.15141  [pdf, other

    cs.CV

    Object-Centric Pretraining via Target Encoder Bootstrapping

    Authors: Nikola Đukić, Tim Lebailly, Tinne Tuytelaars

    Abstract: Object-centric representation learning has recently been successfully applied to real-world datasets. This success can be attributed to pretrained non-object-centric foundation models, whose features serve as reconstruction targets for slot attention. However, targets must remain frozen throughout the training, which sets an upper bound on the performance object-centric models can attain. Attempts… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: ICLR 2025

  7. arXiv:2503.13961  [pdf, other

    cs.GR cs.CV

    BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering

    Authors: Minye Wu, Haizhao Dai, Kaixin Yao, Tinne Tuytelaars, Jingyi Yu

    Abstract: Differentiable rendering enables efficient optimization by allowing gradients to be computed through the rendering process, facilitating 3D reconstruction, inverse rendering and neural scene representation learning. To ensure differentiability, existing solutions approximate or re-formulate traditional rendering operations using smooth, probabilistic proxies such as volumes or Gaussian primitives.… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  8. arXiv:2503.09321  [pdf, other

    cs.CV cs.AI cs.LG

    DAVE: Diagnostic benchmark for Audio Visual Evaluation

    Authors: Gorjan Radevski, Teodora Popordanoska, Matthew B. Blaschko, Tinne Tuytelaars

    Abstract: Audio-visual understanding is a rapidly evolving field that seeks to integrate and interpret information from both auditory and visual modalities. Despite recent advances in multi-modal learning, existing benchmarks often suffer from strong visual bias -- where answers can be inferred from visual data alone -- and provide only aggregate scores that conflate multiple sources of error. This makes it… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: First two authors contributed equally

  9. arXiv:2503.06632  [pdf, other

    cs.CV

    Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias

    Authors: Mingxiao Li, Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Personalized image generation via text prompts has great potential to improve daily life and professional work by facilitating the creation of customized visual content. The aim of image personalization is to create images based on a user-provided subject while maintaining both consistency of the subject and flexibility to accommodate various textual descriptions of that subject. However, current… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 18

  10. arXiv:2502.21313  [pdf, other

    cs.CV cs.LG

    Unsupervised Parameter Efficient Source-free Post-pretraining

    Authors: Abhishek Jha, Tinne Tuytelaars, Yuki M. Asano

    Abstract: Following the success in NLP, the best vision models are now in the billion parameter ranges. Adapting these large models to a target distribution has become computationally and economically prohibitive. Addressing this challenge, we introduce UpStep, an Unsupervised Parameter-efficient Source-free post-pretraining approach, designed to efficiently adapt a base model from a source domain to a targ… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  11. arXiv:2502.21147  [pdf, other

    cs.LG cs.CV

    Same accuracy, twice as fast: continuous training surpasses retraining from scratch

    Authors: Eli Verwimp, Guy Hacohen, Tinne Tuytelaars

    Abstract: Continual learning aims to enable models to adapt to new datasets without losing performance on previously learned data, often assuming that prior data is no longer available. However, in many practical scenarios, both old and new data are accessible. In such cases, good performance on both datasets is typically achieved by abandoning the model trained on the previous data and re-training a new mo… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  12. arXiv:2502.20518  [pdf, other

    cs.CV

    On the Role of Individual Differences in Current Approaches to Computational Image Aesthetics

    Authors: Li-Wei Chen, Ombretta Strafforello, Anne-Sofie Maerten, Tinne Tuytelaars, Johan Wagemans

    Abstract: Image aesthetic assessment (IAA) evaluates image aesthetics, a task complicated by image diversity and user subjectivity. Current approaches address this in two stages: Generic IAA (GIAA) models estimate mean aesthetic scores, while Personal IAA (PIAA) models adapt GIAA using transfer learning to incorporate user subjectivity. However, a theoretical understanding of transfer learning between GIAA… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 15 pages

  13. arXiv:2502.20503  [pdf, ps, other

    cs.CL

    Protecting multimodal large language models against misleading visualizations

    Authors: Jonathan Tonglet, Tinne Tuytelaars, Marie-Francine Moens, Iryna Gurevych

    Abstract: Visualizations play a pivotal role in daily communication in an increasingly datadriven world. Research on multimodal large language models (MLLMs) for automated chart understanding has accelerated massively, with steady improvements on standard benchmarks. However, for MLLMs to be reliable, they must be robust to misleading visualizations, i.e., charts that distort the underlying data, leading re… ▽ More

    Submitted 31 May, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Preprint. Code and data available at https://github.com/UKPLab/arxiv2025-misleading-visualizations

  14. arXiv:2502.11927  [pdf, other

    cs.LG

    Continual Learning Should Move Beyond Incremental Classification

    Authors: Rupert Mitchell, Antonio Alliegro, Raffaello Camoriano, Dustin Carrión-Ojeda, Antonio Carta, Georgia Chalvatzaki, Nikhil Churamani, Carlo D'Eramo, Samin Hamidi, Robin Hesse, Fabian Hinder, Roshni Ramanna Kamath, Vincenzo Lomonaco, Subarnaduti Paul, Francesca Pistilli, Tinne Tuytelaars, Gido M van de Ven, Kristian Kersting, Simone Schaub-Meyer, Martin Mundt

    Abstract: Continual learning (CL) is the sub-field of machine learning concerned with accumulating knowledge in dynamic environments. So far, CL research has mainly focused on incremental classification tasks, where models learn to classify new categories while retaining knowledge of previously learned ones. Here, we argue that maintaining such a focus limits both theoretical development and practical appli… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  15. arXiv:2502.03227  [pdf, other

    cs.LG cs.CV

    Adversarial Dependence Minimization

    Authors: Pierre-François De Plaen, Tinne Tuytelaars, Marc Proesmans, Luc Van Gool

    Abstract: Many machine learning techniques rely on minimizing the covariance between output feature dimensions to extract minimally redundant representations from data. However, these methods do not eliminate all dependencies/redundancies, as linearly uncorrelated variables can still exhibit nonlinear relationships. This work provides a differentiable and scalable algorithm for dependence minimization that… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  16. arXiv:2411.11066  [pdf, other

    cs.CV

    TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models

    Authors: Tingyu Qu, Mingxiao Li, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Recent advances in multimodal Large Language Models (LLMs) have shown great success in understanding multi-modal contents. For video understanding tasks, training-based video LLMs are difficult to build due to the scarcity of high-quality, curated video-text paired data. In contrast, paired image-text data are much easier to obtain, and there is substantial similarity between images and videos. Co… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

    Comments: work in progress

  17. arXiv:2410.04959  [pdf, ps, other

    cs.LG stat.ML

    Collapse-Proof Non-Contrastive Self-Supervised Learning

    Authors: Emanuele Sansone, Tim Lebailly, Tinne Tuytelaars

    Abstract: We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances genera… ▽ More

    Submitted 6 July, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: ICML 2025

  18. arXiv:2409.18228  [pdf, other

    cs.CV

    Analysis of Spatial augmentation in Self-supervised models in the purview of training and test distributions

    Authors: Abhishek Jha, Tinne Tuytelaars

    Abstract: In this paper, we present an empirical study of typical spatial augmentation techniques used in self-supervised representation learning methods (both contrastive and non-contrastive), namely random crop and cutout. Our contributions are: (a) we dissociate random cropping into two separate augmentations, overlap and patch, and provide a detailed analysis on the effect of area of overlap and patch s… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Accepted in ECCV 2024 Workshop on Out-of-distribution generalization in computer vision (OOD-CV)

  19. arXiv:2409.17313  [pdf, other

    cs.CV cs.AI cs.CL

    Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

    Authors: Zehao Wang, Minye Wu, Yixin Cao, Yubo Ma, Meiqi Chen, Tinne Tuytelaars

    Abstract: This study presents a novel evaluation framework for the Vision-Language Navigation (VLN) task. It aims to diagnose current models for various instruction categories at a finer-grained level. The framework is structured around the context-free grammar (CFG) of the task. The CFG serves as the basis for the problem decomposition and the core premise of the instruction categories design. We propose a… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 Findings; project page: https://zehao-wang.github.io/navnuances

  20. arXiv:2409.07098  [pdf, other

    cs.CV cs.AI

    Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

    Authors: Zehao Wang, Han Zhou, Matthew B. Blaschko, Tinne Tuytelaars, Minye Wu

    Abstract: Novel view synthesis of indoor scenes can be achieved by capturing a monocular video sequence of the environment. However, redundant information caused by artificial movements in the input video data reduces the efficiency of scene modeling. To address this, we formulate the problem as a combinatorial optimization task for view subset selection. In this work, we propose a novel subset selection fr… ▽ More

    Submitted 21 May, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 12 pages, TMLR 2025

  21. arXiv:2408.12481  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors

    Authors: Manuele Rusci, Francesco Paci, Marco Fariselli, Eric Flamand, Tinne Tuytelaars

    Abstract: This paper proposes a self-learning method to incrementally train (fine-tune) a personalized Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio sensors. We address the fundamental problem of the absence of labeled training data by assigning pseudo-labels to the new recorded audio frames based on a similarity score with respect to few user recordings. By experimenting… ▽ More

    Submitted 7 March, 2025; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Published on IEEE IoT Journal

  22. arXiv:2408.10041   

    cs.CV

    Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation

    Authors: Minye Wu, Tinne Tuytelaars

    Abstract: Recent advancements in photo-realistic novel view synthesis have been significantly driven by Gaussian Splatting (3DGS). Nevertheless, the explicit nature of 3DGS data entails considerable storage requirements, highlighting a pressing need for more efficient data representations. To address this, we present Implicit Gaussian Splatting (IGS), an innovative hybrid model that integrates explicit poin… ▽ More

    Submitted 9 November, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: Please note that the authors discovered configuration errors in the comparisons within the experiment section, resulting in unreliable quantitative results. We advise referencing the results in this paper with caution

  23. arXiv:2408.09053  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

    Authors: Vladimir Araujo, Marie-Francine Moens, Tinne Tuytelaars

    Abstract: Parameter-efficient fine-tuning (PEFT) methods are increasingly used with pre-trained language models (PLMs) for continual learning (CL). These methods typically involve training a PEFT module for each new task and employing similarity-based selection to route modules during inference. However, they face two major limitations: 1) interference during module training with already learned modules and… ▽ More

    Submitted 29 October, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted paper EMNLP2024

  24. arXiv:2406.16085  [pdf, other

    cs.CV

    A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

    Authors: Thomas Stegmüller, Tim Lebailly, Nikola Dukic, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran

    Abstract: Zero-shot classification capabilities naturally arise in models trained within a vision-language contrastive framework. Despite their classification prowess, these models struggle in dense tasks like zero-shot open-vocabulary segmentation. This deficiency is often attributed to the absence of localization cues in captions and the intertwined nature of the learning process, which encompasses both i… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  25. arXiv:2406.09935  [pdf, ps, other

    cs.LG

    Predicting the Susceptibility of Examples to Catastrophic Forgetting

    Authors: Guy Hacohen, Tinne Tuytelaars

    Abstract: Catastrophic forgetting - the tendency of neural networks to forget previously learned data when learning new information - remains a central challenge in continual learning. In this work, we adopt a behavioral approach, observing a connection between learning speed and forgetting: examples learned more quickly are less prone to forgetting. Focusing on replay-based continual learning, we show that… ▽ More

    Submitted 3 July, 2025; v1 submitted 14 June, 2024; originally announced June 2024.

  26. arXiv:2404.18020  [pdf, other

    cs.CV

    DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images

    Authors: Maria Mihaela Trusca, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Text-based semantic image editing assumes the manipulation of an image using a natural language instruction. Although recent works are capable of generating creative and qualitative images, the problem is still mostly approached as a black box sensitive to generating unexpected outputs. Therefore, we propose a novel model to enhance the text-based control of an image editor by explicitly reasoning… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  27. arXiv:2404.13766  [pdf, other

    cs.CV

    Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control

    Authors: Maria Mihaela Trusca, Wolf Nuyts, Jonathan Thomm, Robert Honig, Thomas Hofmann, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Current diffusion models create photorealistic images given a text prompt as input but struggle to correctly bind attributes mentioned in the text to the right objects in the image. This is evidenced by our novel image-graph alignment model called EPViT (Edge Prediction Vision Transformer) for the evaluation of image-text alignment. To alleviate the above problem, we propose focused cross-attentio… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  28. arXiv:2404.12819  [pdf, other

    cs.CV

    Unveiling the Ambiguity in Neural Inverse Rendering: A Parameter Compensation Analysis

    Authors: Georgios Kouros, Minye Wu, Sushruth Nagesh, Xianling Zhang, Tinne Tuytelaars

    Abstract: Inverse rendering aims to reconstruct the scene properties of objects solely from multiview images. However, it is an ill-posed problem prone to producing ambiguous estimations deviating from physically accurate representations. In this paper, we utilize Neural Microfacet Fields (NMF), a state-of-the-art neural inverse rendering method to illustrate the inherent ambiguity. We propose an evaluation… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  29. arXiv:2403.15102  [pdf, other

    cs.RO

    Driving from Vision through Differentiable Optimal Control

    Authors: Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

    Abstract: This paper proposes DriViDOC: a framework for Driving from Vision through Differentiable Optimal Control, and its application to learn autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames with the properties of nonlinear model predictive control (NMPC), such as constraint satisfaction. Our approach leverages the… ▽ More

    Submitted 2 September, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: This work has been accepted for publication in the Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024). Accompanying video available at: https://youtu.be/ENHhphpbPLs

  30. arXiv:2403.10179  [pdf, other

    cs.CV

    Animate Your Motion: Turning Still Images into Dynamic Videos

    Authors: Mingxiao Li, Bo Wan, Marie-Francine Moens, Tinne Tuytelaars

    Abstract: In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on employing either semantic cues, like images or depth maps, or motion-based conditions, like moving sketches or object bounding boxes. Semantic inputs offer a rich s… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted at European Conference on Computer Vision (ECCV 2024)

  31. arXiv:2403.09377  [pdf, other

    cs.CV

    Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks

    Authors: Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Mainstream parameter-efficient fine-tuning (PEFT) methods, such as LoRA or Adapter, project a model's hidden states to a lower dimension, allowing pre-trained models to adapt to new data through this low-rank bottleneck. However, PEFT tasks involving multiple modalities, like vision-language (VL) tasks, require not only adaptation to new data but also learning the relationship between different mo… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted at ECCV 2024

  32. arXiv:2402.14957  [pdf, other

    cs.CV cs.LG

    The Common Stability Mechanism behind most Self-Supervised Learning Approaches

    Authors: Abhishek Jha, Matthew B. Blaschko, Yuki M. Asano, Tinne Tuytelaars

    Abstract: Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniqu… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Additional visualizations (.gif): https://github.com/abskjha/CenterVectorSSL

  33. arXiv:2312.16731  [pdf, other

    cs.LG cs.CV

    Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

    Authors: Sebastian Dziadzio, Çağatay Yıldız, Gido M. van de Ven, Tomasz Trzciński, Tinne Tuytelaars, Matthias Bethge

    Abstract: The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite previously acquired knowledge when learning a new task. Existing methods mitigate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, huma… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 10 pages, 10 figures

    Journal ref: Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274: 498-513, 2025

  34. arXiv:2312.08586  [pdf, other

    cs.LG cs.CV stat.ML

    Estimating calibration error under label shift without labels

    Authors: Teodora Popordanoska, Gorjan Radevski, Tinne Tuytelaars, Matthew B. Blaschko

    Abstract: In the face of dataset shift, model calibration plays a pivotal role in ensuring the reliability of machine learning systems. Calibration error (CE) is an indicator of the alignment between the predicted probabilities and the classifier accuracy. While prior works have delved into the implications of dataset shift on calibration, existing CE estimators assume access to labels from the target domai… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Preprint

  35. arXiv:2312.06713  [pdf, other

    cs.CV

    TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video

    Authors: Minye Wu, Zehao Wang, Georgios Kouros, Tinne Tuytelaars

    Abstract: Neural Radiance Fields (NeRF) revolutionize the realm of visual media by providing photorealistic Free-Viewpoint Video (FVV) experiences, offering viewers unparalleled immersion and interactivity. However, the technology's significant storage requirements and the computational complexity involved in generation and rendering currently limit its broader application. To close this gap, this paper pre… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 13 pages, 11 figures

  36. arXiv:2312.05855  [pdf, other

    cs.CV

    NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences

    Authors: Minye Wu, Tinne Tuytelaars

    Abstract: Adopting Neural Radiance Fields (NeRF) to long-duration dynamic sequences has been challenging. Existing methods struggle to balance between quality and storage size and encounter difficulties with complex scene changes such as topological changes and large motions. To tackle these issues, we propose a novel neural video-based radiance fields (NeVRF) representation. NeVRF marries neural radiance f… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 11 pages, 12 figures

  37. arXiv:2311.14028  [pdf, other

    cs.LG cs.AI cs.CV

    Continual Learning of Diffusion Models with Generative Distillation

    Authors: Sergi Masip, Pau Rodriguez, Tinne Tuytelaars, Gido M. van de Ven

    Abstract: Diffusion models are powerful generative models that achieve state-of-the-art performance in image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus enabling the reuse of trained models for further learning. One potentially suitable continual learnin… ▽ More

    Submitted 20 May, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: To appear in the Proceedings of the Third Conference on Lifelong Learning Agents (CoLLAs), 2024

    Journal ref: Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274: 431-456, 2025

  38. arXiv:2311.11908  [pdf, other

    cs.LG cs.AI cs.CV

    Continual Learning: Applications and the Road Forward

    Authors: Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, Gido M. van de Ven

    Abstract: Continual learning is a subfield of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by examining recent continual learning papers published at four… ▽ More

    Submitted 28 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Journal ref: Transactions on Machine Learning Research (TMLR), 2024

  39. Contrastive Learning for Multi-Object Tracking with Transformers

    Authors: Pierre-François De Plaen, Nicola Marinello, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool

    Abstract: The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-l… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  40. arXiv:2311.04898  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

    Authors: Timm Hess, Tinne Tuytelaars, Gido M. van de Ven

    Abstract: Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when sta… ▽ More

    Submitted 21 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Full paper version of pre-registered report accepted at the 1st ContinualAI Unconference. The originally submitted pre-registered proposal can be found at arXiv:2311.04898v1

    Journal ref: Proceedings of the 1st ContinualAI Unconference, 2023, PMLR 249: 37-61

  41. arXiv:2310.19252  [pdf, other

    cs.CV cs.AI cs.LG

    Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

    Authors: Zifu Wang, Maxim Berman, Amal Rannen-Triki, Philip H. S. Torr, Devis Tuia, Tinne Tuytelaars, Luc Van Gool, Jiaqian Yu, Matthew B. Blaschko

    Abstract: Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wi… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  42. arXiv:2310.07855  [pdf, other

    cs.CV cs.LG

    CrIBo: Self-Supervised Learning via Cross-Image Object-Level Bootstrapping

    Authors: Tim Lebailly, Thomas Stegmüller, Behzad Bozorgtabar, Jean-Philippe Thiran, Tinne Tuytelaars

    Abstract: Leveraging nearest neighbor retrieval for self-supervised representation learning has proven beneficial with object-centric images. However, this approach faces limitations when applied to scene-centric datasets, where multiple objects within an image are only implicitly captured in the global representation. Such global bootstrapping can lead to undesirable entanglement of object representations.… ▽ More

    Submitted 3 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 (spotlight)

  43. arXiv:2309.05069  [pdf, other

    cs.CV

    Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

    Authors: Bo Wan, Tinne Tuytelaars

    Abstract: In this paper, we investigate the task of zero-shot human-object interaction (HOI) detection, a novel paradigm for identifying HOIs without the need for task-specific annotations. To address this challenging task, we employ CLIP, a large-scale pre-trained vision-language model (VLM), for knowledge distillation on multiple levels. Specifically, we design a multi-branch neural network that leverages… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  44. arXiv:2308.08530  [pdf, other

    cs.CV cs.GR

    Ref-DVGO: Reflection-Aware Direct Voxel Grid Optimization for an Improved Quality-Efficiency Trade-Off in Reflective Scene Reconstruction

    Authors: Georgios Kouros, Minye Wu, Shubham Shrivastava, Sushruth Nagesh, Punarjay Chakravarty, Tinne Tuytelaars

    Abstract: Neural Radiance Fields (NeRFs) have revolutionized the field of novel view synthesis, demonstrating remarkable performance. However, the modeling and rendering of reflective objects remain challenging problems. Recent methods have shown significant improvements over the baselines in handling reflective scenes, albeit at the expense of efficiency. In this work, we aim to strike a balance between ef… ▽ More

    Submitted 21 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 5 pages, 4 figures, 3 tables, ICCV TRICKY 2023 Workshop

  45. arXiv:2308.08325  [pdf, other

    cs.CV

    Visually-Aware Context Modeling for News Image Captioning

    Authors: Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: News Image Captioning aims to create captions from news articles and images, emphasizing the connection between textual context and visual elements. Recognizing the significance of human faces in news images and the face-name co-occurrence pattern in existing datasets, we propose a face-naming module for learning better name embeddings. Apart from names, which can be directly linked to an image ar… ▽ More

    Submitted 21 March, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted at NAACL 2024 Main Conference

  46. arXiv:2307.07483  [pdf, other

    cs.CV

    Multimodal Distillation for Egocentric Action Recognition

    Authors: Gorjan Radevski, Dusan Grujicic, Marie-Francine Moens, Matthew Blaschko, Tinne Tuytelaars

    Abstract: The focal point of egocentric video understanding is modelling hand-object interactions. Standard models, e.g. CNNs or Vision Transformers, which receive RGB frames as input perform well. However, their performance improves further by employing additional input modalities that provide complementary cues, such as object detections, optical flow, audio, etc. The added complexity of the modality-spec… ▽ More

    Submitted 18 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV 2023; Codebase released at https://github.com/gorjanradevski/multimodal-distillation

  47. arXiv:2307.02402  [pdf, other

    cs.CV cs.LG

    Unbalanced Optimal Transport: A Unified Framework for Object Detection

    Authors: Henri De Plaen, Pierre-François De Plaen, Johan A. K. Suykens, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool

    Abstract: During training, supervised object detection tries to correctly match the predicted bounding boxes and associated classification scores to the ground truth. This is essential to determine which predictions are to be pushed towards which solutions, or to be discarded. Popular matching strategies include matching to the closest ground truth box (mostly used in combination with anchors), or matching… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  48. arXiv:2307.01545  [pdf, other

    cs.CV

    EffSeg: Efficient Fine-Grained Instance Segmentation using Structure-Preserving Sparsity

    Authors: Cédric Picron, Tinne Tuytelaars

    Abstract: Many two-stage instance segmentation heads predict a coarse 28x28 mask per instance, which is insufficient to capture the fine-grained details of many objects. To address this issue, PointRend and RefineMask predict a 112x112 segmentation mask resulting in higher quality segmentations. Both methods however have limitations by either not having access to neighboring features (PointRend) or by perfo… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  49. arXiv:2306.02947  [pdf, other

    cs.LG cs.CV

    Continual Learning with Pretrained Backbones by Tuning in the Input Space

    Authors: Simone Marullo, Matteo Tiezzi, Marco Gori, Stefano Melacci, Tinne Tuytelaars

    Abstract: The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. This issue is critical in practical supervised learning settings, such as the ones in which a pre-trained model computes projections toward a latent space where different task predictors are sequentially learned over time. As a matter of fact, in… ▽ More

    Submitted 8 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  50. arXiv:2306.02161  [pdf, other

    cs.LG

    Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

    Authors: Manuele Rusci, Tinne Tuytelaars

    Abstract: A personalized KeyWord Spotting (KWS) pipeline typically requires the training of a Deep Learning model on a large set of user-defined speech utterances, preventing fast customization directly applied on-device. To fill this gap, this paper investigates few-shot learning methods for open-set KWS classification by combining a deep feature encoder with a prototype-based classifier. With user-defined… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023