Skip to main content

Showing 1–50 of 937 results for author: Van, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15513  [pdf, ps, other

    cs.LG cs.AI

    RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation

    Authors: Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia

    Abstract: Retrieval-augmented generation (RAG) has become a common strategy for updating large language model (LLM) responses with current, external information. However, models may still rely on memorized training data, bypass the retrieved evidence, and produce contaminated outputs. We introduce Retrieval-Path Contamination Scoring (RePCS), a diagnostic method that detects such behavior without requiring… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures, 5 tables

  2. arXiv:2506.13680  [pdf, ps, other

    cs.LG stat.ME

    Hybrid Meta-learners for Estimating Heterogeneous Treatment Effects

    Authors: Zhongyuan Liang, Lars van der Laan, Ahmed Alaa

    Abstract: Estimating conditional average treatment effects (CATE) from observational data involves modeling decisions that differ from supervised learning, particularly concerning how to regularize model complexity. Previous approaches can be grouped into two primary "meta-learner" paradigms that impose distinct inductive biases. Indirect meta-learners first fit and regularize separate potential outcome (PO… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  3. arXiv:2506.08954  [pdf, ps, other

    q-bio.QM cs.LG q-bio.BM

    Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction

    Authors: Ruben Weitzman, Peter Mørch Groth, Lood Van Niekerk, Aoi Otani, Yarin Gal, Debora Marks, Pascal Notin

    Abstract: Retrieving homologous protein sequences is essential for a broad range of protein modeling tasks such as fitness prediction, protein design, structure modeling, and protein-protein interactions. Traditional workflows have relied on a two-step process: first retrieving homologs via Multiple Sequence Alignments (MSA), then training models on one or more of these alignments. However, MSA-based retrie… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Accepted at ICML 2025

  4. arXiv:2506.08710  [pdf, ps, other

    cs.CV

    SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting

    Authors: Mengjiao Ma, Qi Ma, Yue Li, Jiahuan Cheng, Runyi Yang, Bin Ren, Nikola Popovic, Mingqiang Wei, Nicu Sebe, Luc Van Gool, Theo Gevers, Martin R. Oswald, Danda Pani Paudel

    Abstract: 3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Current Language Gaussian Splatting line of work fall into three main groups: (i) per-scene optimization-based, (ii) per-scene optimization-free, and (iii) general… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 15 pages, codes, data and benchmark will be released

  5. arXiv:2506.08562  [pdf, ps, other

    cs.CV

    Hierarchical Neural Collapse Detection Transformer for Class Incremental Object Detection

    Authors: Duc Thanh Pham, Hong Dang Nguyen, Nhat Minh Nguyen Quoc, Linh Ngo Van, Sang Dinh Viet, Duc Anh Nguyen

    Abstract: Recently, object detection models have witnessed notable performance improvements, particularly with transformer-based models. However, new objects frequently appear in the real world, requiring detection models to continually learn without suffering from catastrophic forgetting. Although Incremental Object Detection (IOD) has emerged to address this challenge, these existing models are still not… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  6. arXiv:2506.06719  [pdf, ps, other

    cs.CV cs.AI

    Improving Wildlife Out-of-Distribution Detection: Africas Big Five

    Authors: Mufhumudzi Muthivhi, Jiahao Huo, Fredrik Gustafsson, Terence L. van Zyl

    Abstract: Mitigating human-wildlife conflict seeks to resolve unwanted encounters between these parties. Computer Vision provides a solution to identifying individuals that might escalate into conflict, such as members of the Big Five African animals. However, environments often contain several varied species. The current state-of-the-art animal classification models are trained under a closed-world assumpt… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  7. arXiv:2506.05872  [pdf, ps, other

    cs.CV

    Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

    Authors: Yu Li, Xingyu Qiu, Yuqian Fu, Jie Chen, Tianwen Qian, Xu Zheng, Danda Pani Paudel, Yanwei Fu, Xuanjing Huang, Luc Van Gool, Yu-Gang Jiang

    Abstract: Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to detect novel objects with only a handful of labeled samples from previously unseen domains. While data augmentation and generative methods have shown promise in few-shot learning, their effectiveness for CD-FSOD remains unclear due to the need for both visual realism and domain alignment. Existing strategies, such as copy-paste augmentation… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  8. arXiv:2506.05862  [pdf, ps, other

    cs.CV cs.LG

    Improved Allergy Wheal Detection for the Skin Prick Automated Test Device

    Authors: Rembert Daems, Sven Seys, Valérie Hox, Adam Chaker, Glynnis De Greve, Winde Lemmens, Anne-Lise Poirrier, Eline Beckers, Zuzana Diamant, Carmen Dierickx, Peter W. Hellings, Caroline Huart, Claudia Jerin, Mark Jorissen, Hanne Oscé, Karolien Roux, Mark Thompson, Sophie Tombu, Saartje Uyttebroek, Andrzej Zarowski, Senne Gorris, Laura Van Gerven, Dirk Loeckx, Thomas Demeester

    Abstract: Background: The skin prick test (SPT) is the gold standard for diagnosing sensitization to inhalant allergies. The Skin Prick Automated Test (SPAT) device was designed for increased consistency in test results, and captures 32 images to be jointly used for allergy wheal detection and delineation, which leads to a diagnosis. Materials and Methods: Using SPAT data from $868$ patients with suspecte… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: This work is presented at Artificial Intelligence in Medicine 2025, this is the longer (10 pages) version

  9. arXiv:2506.05856  [pdf, ps, other

    cs.CV cs.AI

    Cross-View Multi-Modal Segmentation @ Ego-Exo4D Challenges 2025

    Authors: Yuqian Fu, Runze Wang, Yanwei Fu, Danda Pani Paudel, Luc Van Gool

    Abstract: In this report, we present a cross-view multi-modal object segmentation approach for the object correspondence task in the Ego-Exo4D Correspondence Challenges 2025. Given object queries from one perspective (e.g., ego view), the goal is to predict the corresponding object masks in another perspective (e.g., exo view). To tackle this task, we propose a multimodal condition fusion module that enhanc… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: The 2nd Price Award of EgoExo4D Relations, Second Joint EgoVis Workshop with CVPR2025, technical report paper is accepted by CVPRW 25

  10. arXiv:2506.03697  [pdf, ps, other

    quant-ph cs.LG

    RhoDARTS: Differentiable Quantum Architecture Search with Density Matrix Simulations

    Authors: Swagat Kumar, Jan-Nico Zaech, Colin Michael Wilmott, Luc Van Gool

    Abstract: Variational Quantum Algorithms (VQAs) are a promising approach for leveraging powerful Noisy Intermediate-Scale Quantum (NISQ) computers. When applied to machine learning tasks, VQAs give rise to NISQ-compatible Quantum Neural Networks (QNNs), which have been shown to outperform classical neural networks with a similar number of trainable parameters. While the quantum circuit structures of VQAs fo… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 24 pages, 16 figures

  11. arXiv:2506.03675  [pdf, ps, other

    cs.CV

    BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation

    Authors: Jialei Chen, Xu Zheng, Danda Pani Paudel, Luc Van Gool, Hiroshi Murase, Daisuke Deguchi

    Abstract: Utilizing multi-modal data enhances scene understanding by providing complementary semantic and geometric information. Existing methods fuse features or distill knowledge from multiple modalities into a unified representation, improving robustness but restricting each modality's ability to fully leverage its strengths in different situations. We reformulate multi-modal semantic segmentation as a m… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  12. arXiv:2506.01667  [pdf, other

    cs.CV

    EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models

    Authors: Yan Shu, Bin Ren, Zhitong Xiong, Danda Pani Paudel, Luc Van Gool, Begum Demir, Nicu Sebe, Paolo Rota

    Abstract: Large Multimodal Models (LMMs) have demonstrated strong performance in various vision-language tasks. However, they often struggle to comprehensively understand Earth Observation (EO) data, which is critical for monitoring the environment and the effects of human activity on it. In this work, we present EarthMind, a novel vision-language framework for multi-granular and multi-sensor EO data unders… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  13. arXiv:2505.23003  [pdf, ps, other

    cs.LG cs.AI

    Hybrid Cross-domain Robust Reinforcement Learning

    Authors: Linh Le Pham Van, Minh Hoang Nguyen, Hung Le, Hung The Tran, Sunil Gupta

    Abstract: Robust reinforcement learning (RL) aims to learn policies that remain effective despite uncertainties in its environment, which frequently arise in real-world applications due to variations in environment dynamics. The robust RL methods learn a robust policy by maximizing value under the worst-case models within a predefined uncertainty set. Offline robust RL algorithms are particularly promising… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted at ECML PKDD 2025

  14. arXiv:2505.22246  [pdf, ps, other

    cs.CV

    StateSpaceDiffuser: Bringing Long Context to Diffusion World Models

    Authors: Nedko Savov, Naser Kazemi, Deheng Zhang, Danda Pani Paudel, Xi Wang, Luc Van Gool

    Abstract: World models have recently become promising tools for predicting realistic visuals based on actions in complex environments. However, their reliance on a short sequence of observations causes them to quickly lose track of context. As a result, visual consistency breaks down after just a few steps, and generated scenes no longer reflect information seen earlier. This limitation of the state-of-the-… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  15. arXiv:2505.18679  [pdf, ps, other

    cs.CV

    Manifold-aware Representation Learning for Degradation-agnostic Image Restoration

    Authors: Bin Ren, Yawei Li, Xu Zheng, Yuqian Fu, Danda Pani Paudel, Ming-Hsuan Yang, Luc Van Gool, Nicu Sebe

    Abstract: Image Restoration (IR) aims to recover high quality images from degraded inputs affected by various corruptions such as noise, blur, haze, rain, and low light conditions. Despite recent advances, most existing approaches treat IR as a direct mapping problem, relying on shared representations across degradation types without modeling their structural diversity. In this work, we present MIRAGE, a un… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: ALl-in-One Image Restoration, low-level vision

  16. arXiv:2505.18657  [pdf, ps, other

    cs.AI

    MLLMs are Deeply Affected by Modality Bias

    Authors: Xu Zheng, Chenfei Liao, Yuqian Fu, Kaiyu Lei, Yuanhuiyi Lyu, Lutao Jiang, Bin Ren, Jialei Chen, Jiawen Wang, Chengxin Li, Linfeng Zhang, Danda Pani Paudel, Xuanjing Huang, Yu-Gang Jiang, Nicu Sebe, Dacheng Tao, Luc Van Gool, Xuming Hu

    Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images. MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs. This position paper argues that MLLMs are deeply affected by modality bias. Firstly, we diagnose the current state of m… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  17. arXiv:2505.16482  [pdf, ps, other

    cs.AI cs.NE

    Minimizing the energy depletion in wireless rechargeable sensor networks using bi-level metaheuristic charging schemes

    Authors: Huynh Thi Thanh Binh, Le Van Cuong, Dang Hai Dang, Le Trong Vinh

    Abstract: Recently, Wireless Rechargeable Sensor Networks (WRSNs) that leveraged the advantage of wireless energy transfer technology have opened a promising opportunity in solving the limited energy issue. However, an ineffective charging strategy may reduce the charging performance. Although many practical charging algorithms have been introduced, these studies mainly focus on optimizing the charging path… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  18. arXiv:2505.15616  [pdf, ps, other

    cs.CV

    LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models

    Authors: Ruilin Yao, Bo Zhang, Jirui Huang, Xinwei Long, Yifang Zhang, Tianyu Zou, Yufei Wu, Shichao Su, Yifan Xu, Wenxi Zeng, Zhaoyu Yang, Guoyou Li, Shilan Zhang, Zichan Li, Yaxiong Chen, Shengwu Xiong, Peng Xu, Jiajun Zhang, Bowen Zhou, David Clifton, Luc Van Gool

    Abstract: Multimodal Large Language Models (MLLMs) have achieved significant advances in integrating visual and linguistic information, yet their ability to reason about complex and real-world scenarios remains limited. The existing benchmarks are usually constructed in the task-oriented manner without guarantee that different task samples come from the same data distribution, thus they often fall short in… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  19. arXiv:2505.15511  [pdf, ps, other

    cs.LG

    NOMAD Projection

    Authors: Brandon Duderstadt, Zach Nussbaum, Laurens van der Maaten

    Abstract: The rapid adoption of generative AI has driven an explosion in the size of datasets consumed and produced by AI models. Traditional methods for unstructured data visualization, such as t-SNE and UMAP, have not kept up with the pace of dataset scaling. This presents a significant challenge for AI explainability, which relies on methods such as t-SNE and UMAP for exploratory data analysis. In this p… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  20. arXiv:2505.14442  [pdf, ps, other

    cs.CL cs.AI

    Creative Preference Optimization

    Authors: Mete Ismayilzada, Antonio Laverghetta Jr., Simone A. Luchini, Reet Patel, Antoine Bosselut, Lonneke van der Plas, Roger Beaty

    Abstract: While Large Language Models (LLMs) have demonstrated impressive performance across natural language generation tasks, their ability to generate truly creative content-characterized by novelty, diversity, surprise, and quality-remains limited. Existing methods for enhancing LLM creativity often focus narrowly on diversity or specific tasks, failing to address creativity's multifaceted nature in a g… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 27 pages

  21. arXiv:2505.13944  [pdf, ps, other

    cs.CL

    Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting

    Authors: Bao-Ngoc Dao, Quang Nguyen, Luyen Ngo Dinh, Minh Le, Nam Le, Linh Ngo Van

    Abstract: Memory-based approaches have shown strong performance in Continual Relation Extraction (CRE). However, storing examples from previous tasks increases memory usage and raises privacy concerns. Recently, prompt-based methods have emerged as a promising alternative, as they do not rely on storing past samples. Despite this progress, current prompt-based techniques face several core challenges in CRE,… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  22. arXiv:2505.12682  [pdf, ps, other

    cs.LG

    RoFL: Robust Fingerprinting of Language Models

    Authors: Yun-Yun Tsai, Chuan Guo, Junfeng Yang, Laurens van der Maaten

    Abstract: AI developers are releasing large language models (LLMs) under a variety of different licenses. Many of these licenses restrict the ways in which the models or their outputs may be used. This raises the question how license violations may be recognized. In particular, how can we identify that an API or product uses (an adapted version of) a particular LLM? We present a new method that enable model… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  23. arXiv:2505.11907  [pdf, ps, other

    cs.CV

    Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?

    Authors: Zihao Dongfang, Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Danda Pani Paudel, Luc Van Gool, Kailun Yang, Xuming Hu

    Abstract: The 180x360 omnidirectional field of view captured by 360-degree cameras enables their use in a wide range of applications such as embodied AI and virtual reality. Although recent advances in multimodal large language models (MLLMs) have shown promise in visual-spatial reasoning, most studies focus on standard pinhole-view images, leaving omnidirectional perception largely unexplored. In this pape… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  24. arXiv:2505.09562  [pdf, ps, other

    cs.CV

    Camera-Only 3D Panoptic Scene Completion for Autonomous Driving through Differentiable Object Shapes

    Authors: Nicola Marinello, Simen Cassiman, Jonas Heylen, Marc Proesmans, Luc Van Gool

    Abstract: Autonomous vehicles need a complete map of their surroundings to plan and act. This has sparked research into the tasks of 3D occupancy prediction, 3D scene completion, and 3D panoptic scene completion, which predict a dense map of the ego vehicle's surroundings as a voxel grid. Scene completion extends occupancy prediction by predicting occluded regions of the voxel grid, and panoptic scene compl… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Accepted to CVPR 2025 Workshop on Autonomous Driving

  25. arXiv:2505.09306  [pdf, other

    cs.CV cs.LG

    Predicting butterfly species presence from satellite imagery using soft contrastive regularisation

    Authors: Thijs L van der Plas, Stephen Law, Michael JO Pocock

    Abstract: The growing demand for scalable biodiversity monitoring methods has fuelled interest in remote sensing data, due to its widespread availability and extensive coverage. Traditionally, the application of remote sensing to biodiversity research has focused on mapping and monitoring habitats, but with increasing availability of large-scale citizen-science wildlife observation data, recent methods have… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: To be published in the 2025 CVPR FGVC12 workshop

  26. arXiv:2505.09114  [pdf, ps, other

    cs.AI cs.LG

    Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

    Authors: Minh Hoang Nguyen, Linh Le Pham Van, Thommen George Karimpanal, Sunil Gupta, Hung Le

    Abstract: Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains. However, DT requires high-quality, comprehensive data to perform optimally. In real-world applications, the lack of training data and the scarcity of optimal behaviours make training on offline datasets challenging, as suboptimal data ca… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  27. arXiv:2505.06635  [pdf, ps, other

    cs.CV

    Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization

    Authors: Xu Zheng, Yuanhuiyi Lyu, Lutao Jiang, Danda Pani Paudel, Luc Van Gool, Xuming Hu

    Abstract: Fusing and balancing multi-modal inputs from novel sensors for dense prediction tasks, particularly semantic segmentation, is critically important yet remains a significant challenge. One major limitation is the tendency of multi-modal frameworks to over-rely on easily learnable modalities, a phenomenon referred to as unimodal dominance or bias. This issue becomes especially problematic in real-wo… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  28. arXiv:2505.05023  [pdf, other

    cs.CV

    Split Matching for Inductive Zero-shot Semantic Segmentation

    Authors: Jialei Chen, Xu Zheng, Dongyue Li, Chong Yi, Seigo Ito, Danda Pani Paudel, Luc Van Gool, Hiroshi Murase, Daisuke Deguchi

    Abstract: Zero-shot Semantic Segmentation (ZSS) aims to segment categories that are not annotated during training. While fine-tuning vision-language models has achieved promising results, these models often overfit to seen categories due to the lack of supervision for unseen classes. As an alternative to fully supervised approaches, query-based segmentation has shown great latent in ZSS, as it enables objec… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  29. arXiv:2505.04109  [pdf, other

    cs.CV

    One2Any: One-Reference 6D Pose Estimation for Any Object

    Authors: Mengya Liu, Siyuan Li, Ajad Chhatkuli, Prune Truong, Luc Van Gool, Federico Tombari

    Abstract: 6D object pose estimation remains challenging for many applications due to dependencies on complete 3D models, multi-view images, or training limited to specific object categories. These requirements make generalization to novel objects difficult for which neither 3D models nor multi-view images may be available. To address this, we propose a novel method One2Any that estimates the relative 6-degr… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: accepted by CVPR 2025

    Journal ref: CVPR 2025

  30. arXiv:2504.19695  [pdf, other

    cs.CV

    SubGrapher: Visual Fingerprinting of Chemical Structures

    Authors: Lucas Morin, Gerhard Ingmar Meijer, Valéry Weber, Luc Van Gool, Peter W. J. Staar

    Abstract: Automatic extraction of chemical structures from scientific literature plays a crucial role in accelerating research across fields ranging from drug discovery to materials science. Patent documents, in particular, contain molecular information in visual form, which is often inaccessible through traditional text-based searches. In this work, we introduce SubGrapher, a method for the visual fingerpr… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  31. arXiv:2504.16143  [pdf, other

    eess.SP cs.LG

    A Statistical Approach for Synthetic EEG Data Generation

    Authors: Gideon Vos, Maryam Ebrahimpour, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi

    Abstract: Electroencephalogram (EEG) data is crucial for diagnosing mental health conditions but is costly and time-consuming to collect at scale. Synthetic data generation offers a promising solution to augment datasets for machine learning applications. However, generating high-quality synthetic EEG that preserves emotional and mental health signals remains challenging. This study proposes a method combin… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 24 pages, 10 figures

    MSC Class: 68T01; 92-08

  32. arXiv:2504.14249  [pdf, other

    cs.CV

    Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation

    Authors: Bin Ren, Eduard Zamfir, Zongwei Wu, Yawei Li, Yidi Li, Danda Pani Paudel, Radu Timofte, Ming-Hsuan Yang, Luc Van Gool, Nicu Sebe

    Abstract: Restoring any degraded image efficiently via just one model has become increasingly significant and impactful, especially with the proliferation of mobile devices. Traditional solutions typically involve training dedicated models per degradation, resulting in inefficiency and redundancy. More recent approaches either introduce additional modules to learn visual prompts, significantly increasing mo… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Efficient All in One Image Restoration

  33. arXiv:2504.13361  [pdf, other

    cs.NI

    Automated Taxi Booking Operations for Autonomous Vehicles

    Authors: Linh Van Ma, Shoaib Azam, Farzeen Munir, Moongu Jeon

    Abstract: In a conventional taxi booking system, all taxi operations are mostly done by a decision made by drivers which is hard to implement in unmanned vehicles. To address this challenge, we introduce a taxi booking system which assists autonomous vehicles to pick up customers. The system can allocate an autonomous vehicle (AV) as well as plan service trips for a customer request. We use our own AV to se… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: International Conference on Signal Processing and Communication Systems, ICSPCS 2019 (http://www.dspcs-witsp.com/icspcs_2019/index.html)

    Journal ref: ICSPCS 2019

  34. arXiv:2504.12401  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Event-Based Image Deblurring: Methods and Results

    Authors: Lei Sun, Andrea Alfarano, Peiqi Duan, Shaolin Su, Kaiwei Wang, Boxin Shi, Radu Timofte, Danda Pani Paudel, Luc Van Gool, Qinglin Liu, Wei Yu, Xiaoqian Lv, Lu Yang, Shuigen Wang, Shengping Zhang, Xiangyang Ji, Long Bao, Yuqiang Yang, Jinao Song, Ziyi Wang, Shuang Wen, Heng Sun, Kean Liu, Mingchen Zhong, Senyan Xu , et al. (63 additional authors not shown)

    Abstract: This paper presents an overview of NTIRE 2025 the First Challenge on Event-Based Image Deblurring, detailing the proposed methodologies and corresponding results. The primary goal of the challenge is to design an event-based method that achieves high-quality image deblurring, with performance quantitatively assessed using Peak Signal-to-Noise Ratio (PSNR). Notably, there are no restrictions on com… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  35. arXiv:2504.12276  [pdf, other

    cs.CV

    The Tenth NTIRE 2025 Image Denoising Challenge Report

    Authors: Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han, Hakjae Jeon, Jia Li, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Jingyu Ma, Zhijuan Huang, Huiyuan Fu, Hongyuan Yu, Boqi Zhang, Jiawei Shi, Heng Zhang, Huadong Ma, Deepak Kumar Tyagi , et al. (69 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent ad… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  36. arXiv:2504.10685  [pdf, other

    cs.CV cs.AI

    NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results

    Authors: Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Timofte, Nicu Sebe, Ming-Hsuan Yang, Luc Van Gool, Kaijin Zhang, Qingpeng Nong, Xiugang Dong, Hong Gao, Xiangsheng Zhou, Jiancheng Pan, Yanxing Liu, Xiao He, Jiahao Li, Yuze Sun, Xiaomeng Huang, Zhenyu Zhang, Ran Ma, Yuhan Liu, Zijian Zhuang, Shuai Yi, Yixiong Zou , et al. (37 additional authors not shown)

    Abstract: Cross-Domain Few-Shot Object Detection (CD-FSOD) poses significant challenges to existing object detection and few-shot detection models when applied across domains. In conjunction with NTIRE 2025, we organized the 1st CD-FSOD Challenge, aiming to advance the performance of current object detectors on entirely novel target domains with only limited labeled data. The challenge attracted 152 registe… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: accepted by CVPRW 25 @ NTIRE

  37. arXiv:2504.10028  [pdf, ps, other

    q-bio.NC cs.AI

    Sequence models for by-trial decoding of cognitive strategies from neural data

    Authors: Rick den Otter, Gabriel Weindel, Sjoerd Stuit, Leendert van Maanen

    Abstract: Understanding the sequence of cognitive operations that underlie decision-making is a fundamental challenge in cognitive neuroscience. Traditional approaches often rely on group-level statistics, which obscure trial-by-trial variations in cognitive strategies. In this study, we introduce a novel machine learning method that combines Hidden Multivariate Pattern analysis with a Structured State Spac… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 15 pages, 6 figures

  38. arXiv:2504.09379  [pdf, other

    cs.CV

    Low-Light Image Enhancement using Event-Based Illumination Estimation

    Authors: Lei Sun, Yuhan Bao, Jiajun Zhai, Jingyun Liang, Yulun Zhang, Kaiwei Wang, Danda Pani Paudel, Luc Van Gool

    Abstract: Low-light image enhancement (LLIE) aims to improve the visibility of images captured in poorly lit environments. Prevalent event-based solutions primarily utilize events triggered by motion, i.e., ''motion events'' to strengthen only the edge texture, while leaving the high dynamic range and excellent low-light responsiveness of event cameras largely unexplored. This paper instead opens a new aven… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  39. arXiv:2504.05963  [pdf, other

    cs.FL cs.LO

    Learning Verified Monitors for Hidden Markov Models

    Authors: Luko van der Maas, Sebastian Junges

    Abstract: Runtime monitors assess whether a system is in an unsafe state based on a stream of observations. We study the problem where the system is subject to probabilistic uncertainty and described by a hidden Markov model. A stream of observations is then unsafe if the probability of being in an unsafe state is above a threshold. A correct monitor recognizes the set of unsafe observations. The key contri… ▽ More

    Submitted 11 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  40. arXiv:2504.03603  [pdf, other

    cs.AI cs.LG

    Towards deployment-centric multimodal AI beyond vision and language

    Authors: Xianyuan Liu, Jiayang Zhang, Shuo Zhou, Thijs L. van der Plas, Avish Vijayaraghavan, Anastasiia Grishina, Mengdie Zhuang, Daniel Schofield, Christopher Tomlinson, Yuhan Wang, Ruizhe Li, Louisa van Zeeland, Sina Tabakhi, Cyndie Demeocq, Xiang Li, Arunav Das, Orlando Timmerman, Thomas Baldwin-McDonald, Jinge Wu, Peizhen Bai, Zahraa Al Sahili, Omnia Alwazzan, Thao N. Do, Mohammod N. I. Suvon, Angeline Wang , et al. (23 additional authors not shown)

    Abstract: Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that in… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  41. arXiv:2504.02515  [pdf, other

    cs.CV

    Exploration-Driven Generative Interactive Environments

    Authors: Nedko Savov, Naser Kazemi, Mohammad Mahdi, Danda Pani Paudel, Xi Wang, Luc Van Gool

    Abstract: Modern world models require costly and time-consuming collection of large video datasets with action demonstrations by people or by environment-specific agents. To simplify training, we focus on using many virtual environments for inexpensive, automatically collected interaction data. Genie, a recent multi-environment world model, demonstrates simulation abilities of many environments with shared… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR 2025

  42. arXiv:2503.22869  [pdf, ps, other

    cs.CV

    SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories

    Authors: Alexey Gavryushin, Alexandros Delitzas, Luc Van Gool, Marc Pollefeys, Kaichun Mo, Xi Wang

    Abstract: When humans grasp an object, they naturally form trajectories in their minds to manipulate it for specific tasks. Modeling hand-object interaction priors holds significant potential to advance robotic and embodied AI systems in learning to operate effectively within the physical world. We introduce SIGHT, a novel task focused on generating realistic and physically plausible 3D hand-object interact… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  43. arXiv:2503.21588  [pdf, other

    cs.LG physics.ao-ph

    Generalizable Implicit Neural Representations via Parameterized Latent Dynamics for Baroclinic Ocean Forecasting

    Authors: Guang Zhao, Xihaier Luo, Seungjun Lee, Yihui Ren, Shinjae Yoo, Luke Van Roekel, Balu Nadiga, Sri Hari Krishna Narayanan, Yixuan Sun, Wei Xu

    Abstract: Mesoscale ocean dynamics play a critical role in climate systems, governing heat transport, hurricane genesis, and drought patterns. However, simulating these processes at high resolution remains computationally prohibitive due to their nonlinear, multiscale nature and vast spatiotemporal domains. Implicit neural representations (INRs) reduce the computational costs as resolution-independent surro… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  44. arXiv:2503.18445  [pdf, other

    cs.CV

    Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness

    Authors: Chenfei Liao, Kaiyu Lei, Xu Zheng, Junha Moon, Zhixiong Wang, Yixuan Wang, Danda Pani Paudel, Luc Van Gool, Xuming Hu

    Abstract: Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities. Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality. Robustness has thus become essential for practical MMSS applications. However, the absenc… ▽ More

    Submitted 10 April, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: This paper has been accepted by the CVPR 2025 Workshop: TMM-OpenWorld as an oral presentation paper

  45. arXiv:2503.18052  [pdf, ps, other

    cs.CV

    SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

    Authors: Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel

    Abstract: Recognizing arbitrary or previously unseen categories is essential for comprehensive real-world 3D scene understanding. Currently, all existing methods rely on 2D or textual modalities during training or together at inference. This highlights the clear absence of a model capable of processing 3D data alone for learning semantics end-to-end, along with the necessary data to train such a model. Mean… ▽ More

    Submitted 3 June, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: Our code, model, and dataset will be released at https://unique1i.github.io/SceneSplat_webpage/

  46. arXiv:2503.18016  [pdf, other

    cs.CV

    Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

    Authors: Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, Xuming Hu

    Abstract: Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI), particularly in enhancing the capabilities of large language models (LLMs) by enabling access to external, reliable, and up-to-date knowledge sources. In the context of AI-Generated Content (AIGC), RAG has proven invaluable by augmenting model outputs with supplementary, relevant information, t… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 19 pages, 10 figures

  47. arXiv:2503.16591  [pdf, other

    cs.CV

    UniK3D: Universal Camera Monocular 3D Estimation

    Authors: Luigi Piccinelli, Christos Sakaridis, Mattia Segu, Yung-Hsu Yang, Siyuan Li, Wim Abbeloos, Luc Van Gool

    Abstract: Monocular 3D estimation is crucial for visual perception. However, current methods fall short by relying on oversimplified assumptions, such as pinhole camera models or rectified images. These limitations severely restrict their general applicability, causing poor performance in real-world scenarios with fisheye or panoramic images and resulting in substantial context loss. To address this, we pre… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  48. arXiv:2503.16096  [pdf, other

    cs.CV

    MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures

    Authors: Lucas Morin, Valéry Weber, Ahmed Nassar, Gerhard Ingmar Meijer, Luc Van Gool, Yawei Li, Peter Staar

    Abstract: The automated analysis of chemical literature holds promise to accelerate discovery in fields such as material science and drug development. In particular, search capabilities for chemical structures and Markush structures (chemical structure templates) within patent documents are valuable, e.g., for prior-art search. Advancements have been made in the automatic extraction of chemical structures f… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  49. arXiv:2503.08183  [pdf, other

    physics.comp-ph cs.CV physics.optics

    Physics-based AI methodology for Material Parameter Extraction from Optical Data

    Authors: M. Koumans, J. L. M. van Mechelen

    Abstract: We report on a novel methodology for extracting material parameters from spectroscopic optical data using a physics-based neural network. The proposed model integrates classical optimization frameworks with a multi-scale object detection framework, specifically exploring the effect of incorporating physics into the neural network. We validate and analyze its performance on simulated transmission s… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Submitted for IRMMW-THz 2025 conference proceedings

  50. arXiv:2503.07172  [pdf, other

    cs.AI cs.LO cs.SE

    Lawful and Accountable Personal Data Processing with GDPR-based Access and Usage Control in Distributed Systems

    Authors: L. Thomas van Binsbergen, Marten C. Steketee, Milen G. Kebede, Heleen L. Janssen, Tom M. van Engers

    Abstract: Compliance with the GDPR privacy regulation places a significant burden on organisations regarding the handling of personal data. The perceived efforts and risks of complying with the GDPR further increase when data processing activities span across organisational boundaries, as is the case in both small-scale data sharing settings and in large-scale international data spaces. This paper address… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Submitted for review to the Journal of AI and Law, 49 pages (including)