-
Reliable uncertainty quantification for 2D/3D anatomical landmark localization using multi-output conformal prediction
Authors:
Jef Jonkers,
Frank Coopman,
Luc Duchateau,
Glenn Van Wallendael,
Sofie Van Hoecke
Abstract:
Automatic anatomical landmark localization in medical imaging requires not just accurate predictions but reliable uncertainty quantification for effective clinical decision support. Current uncertainty quantification approaches often fall short, particularly when combined with normality assumptions, systematically underestimating total predictive uncertainty. This paper introduces conformal predic…
▽ More
Automatic anatomical landmark localization in medical imaging requires not just accurate predictions but reliable uncertainty quantification for effective clinical decision support. Current uncertainty quantification approaches often fall short, particularly when combined with normality assumptions, systematically underestimating total predictive uncertainty. This paper introduces conformal prediction as a framework for reliable uncertainty quantification in anatomical landmark localization, addressing a critical gap in automatic landmark localization. We present two novel approaches guaranteeing finite-sample validity for multi-output prediction: Multi-output Regression-as-Classification Conformal Prediction (M-R2CCP) and its variant Multi-output Regression to Classification Conformal Prediction set to Region (M-R2C2R). Unlike conventional methods that produce axis-aligned hyperrectangular or ellipsoidal regions, our approaches generate flexible, non-convex prediction regions that better capture the underlying uncertainty structure of landmark predictions. Through extensive empirical evaluation across multiple 2D and 3D datasets, we demonstrate that our methods consistently outperform existing multi-output conformal prediction approaches in both validity and efficiency. This work represents a significant advancement in reliable uncertainty estimation for anatomical landmark localization, providing clinicians with trustworthy confidence measures for their diagnoses. While developed for medical imaging, these methods show promise for broader applications in multi-output regression problems.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
landmarker: a Toolkit for Anatomical Landmark Localization in 2D/3D Images
Authors:
Jef Jonkers,
Luc Duchateau,
Glenn Van Wallendael,
Sofie Van Hoecke
Abstract:
Anatomical landmark localization in 2D/3D images is a critical task in medical imaging. Although many general-purpose tools exist for landmark localization in classical computer vision tasks, such as pose estimation, they lack the specialized features and modularity necessary for anatomical landmark localization applications in the medical domain. Therefore, we introduce landmarker, a Python packa…
▽ More
Anatomical landmark localization in 2D/3D images is a critical task in medical imaging. Although many general-purpose tools exist for landmark localization in classical computer vision tasks, such as pose estimation, they lack the specialized features and modularity necessary for anatomical landmark localization applications in the medical domain. Therefore, we introduce landmarker, a Python package built on PyTorch. The package provides a comprehensive, flexible toolkit for developing and evaluating landmark localization algorithms, supporting a range of methodologies, including static and adaptive heatmap regression. landmarker enhances the accuracy of landmark identification, streamlines research and development processes, and supports various image formats and preprocessing pipelines. Its modular design allows users to customize and extend the toolkit for specific datasets and applications, accelerating innovation in medical imaging. landmarker addresses a critical need for precision and customization in landmark localization tasks not adequately met by existing general-purpose pose estimation tools.
△ Less
Submitted 5 May, 2025; v1 submitted 17 January, 2025;
originally announced January 2025.
-
TGIF: Text-Guided Inpainting Forgery Dataset
Authors:
Hannes Mareen,
Dimitrios Karageorgiou,
Glenn Van Wallendael,
Peter Lambert,
Symeon Papadopoulos
Abstract:
Digital image manipulation has become increasingly accessible and realistic with the advent of generative AI technologies. Recent developments allow for text-guided inpainting, making sophisticated image edits possible with minimal effort. This poses new challenges for digital media forensics. For example, diffusion model-based approaches could either splice the inpainted region into the original…
▽ More
Digital image manipulation has become increasingly accessible and realistic with the advent of generative AI technologies. Recent developments allow for text-guided inpainting, making sophisticated image edits possible with minimal effort. This poses new challenges for digital media forensics. For example, diffusion model-based approaches could either splice the inpainted region into the original image, or regenerate the entire image. In the latter case, traditional image forgery localization (IFL) methods typically fail. This paper introduces the Text-Guided Inpainting Forgery (TGIF) dataset, a comprehensive collection of images designed to support the training and evaluation of image forgery localization and synthetic image detection (SID) methods. The TGIF dataset includes approximately 75k forged images, originating from popular open-source and commercial methods, namely SD2, SDXL, and Adobe Firefly. We benchmark several state-of-the-art IFL and SID methods on TGIF. Whereas traditional IFL methods can detect spliced images, they fail to detect regenerated inpainted images. Moreover, traditional SID may detect the regenerated inpainted images to be fake, but cannot localize the inpainted area. Finally, both IFL and SID methods fail when exposed to stronger compression, while they are less robust to modern compression algorithms, such as WEBP. In conclusion, this work demonstrates the inefficiency of state-of-the-art detectors on local manipulations performed by modern generative approaches, and aspires to help with the development of more capable IFL and SID methods. The dataset and code can be downloaded at https://github.com/IDLabMedia/tgif-dataset.
△ Less
Submitted 4 October, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Conformal Predictive Systems Under Covariate Shift
Authors:
Jef Jonkers,
Glenn Van Wallendael,
Luc Duchateau,
Sofie Van Hoecke
Abstract:
Conformal Predictive Systems (CPS) offer a versatile framework for constructing predictive distributions, allowing for calibrated inference and informative decision-making. However, their applicability has been limited to scenarios adhering to the Independent and Identically Distributed (IID) model assumption. This paper extends CPS to accommodate scenarios characterized by covariate shifts. We th…
▽ More
Conformal Predictive Systems (CPS) offer a versatile framework for constructing predictive distributions, allowing for calibrated inference and informative decision-making. However, their applicability has been limited to scenarios adhering to the Independent and Identically Distributed (IID) model assumption. This paper extends CPS to accommodate scenarios characterized by covariate shifts. We therefore propose Weighted CPS (WCPS), akin to Weighted Conformal Prediction (WCP), leveraging likelihood ratios between training and testing covariate distributions. This extension enables the construction of nonparametric predictive distributions capable of handling covariate shifts. We present theoretical underpinnings and conjectures regarding the validity and efficacy of WCPS and demonstrate its utility through empirical evaluations on both synthetic and real-world datasets. Our simulation experiments indicate that WCPS are probabilistically calibrated under covariate shift.
△ Less
Submitted 16 September, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Blind Deep-Learning-Based Image Watermarking Robust Against Geometric Transformations
Authors:
Hannes Mareen,
Lucas Antchougov,
Glenn Van Wallendael,
Peter Lambert
Abstract:
Digital watermarking enables protection against copyright infringement of images. Although existing methods embed watermarks imperceptibly and demonstrate robustness against attacks, they typically lack resilience against geometric transformations. Therefore, this paper proposes a new watermarking method that is robust against geometric attacks. The proposed method is based on the existing HiDDeN…
▽ More
Digital watermarking enables protection against copyright infringement of images. Although existing methods embed watermarks imperceptibly and demonstrate robustness against attacks, they typically lack resilience against geometric transformations. Therefore, this paper proposes a new watermarking method that is robust against geometric attacks. The proposed method is based on the existing HiDDeN architecture that uses deep learning for watermark encoding and decoding. We add new noise layers to this architecture, namely for a differentiable JPEG estimation, rotation, rescaling, translation, shearing and mirroring. We demonstrate that our method outperforms the state of the art when it comes to geometric robustness. In conclusion, the proposed method can be used to protect images when viewed on consumers' devices.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Conformal Convolution and Monte Carlo Meta-learners for Predictive Inference of Individual Treatment Effects
Authors:
Jef Jonkers,
Jarne Verhaeghe,
Glenn Van Wallendael,
Luc Duchateau,
Sofie Van Hoecke
Abstract:
Generating probabilistic forecasts of potential outcomes and individual treatment effects (ITE) is essential for risk-aware decision-making in domains such as healthcare, policy, marketing, and finance. We propose two novel methods: the conformal convolution T-learner (CCT) and the conformal Monte Carlo (CMC) meta-learner, that generate full predictive distributions of both potential outcomes and…
▽ More
Generating probabilistic forecasts of potential outcomes and individual treatment effects (ITE) is essential for risk-aware decision-making in domains such as healthcare, policy, marketing, and finance. We propose two novel methods: the conformal convolution T-learner (CCT) and the conformal Monte Carlo (CMC) meta-learner, that generate full predictive distributions of both potential outcomes and ITEs. Our approaches combine weighted conformal predictive systems with either analytic convolution of potential outcome distributions or Monte Carlo sampling, addressing covariate shift through propensity score weighting. In contrast to other approaches that allow the generation of potential outcome predictive distributions, our approaches are model agnostic, universal, and come with finite-sample guarantees of probabilistic calibration under knowledge of the propensity score. Regarding estimating the ITE distribution, we formally characterize how assumptions about potential outcomes' noise dependency impact distribution validity and establish universal consistency under independence noise assumptions. Experiments on synthetic and semi-synthetic datasets demonstrate that the proposed methods achieve probabilistically calibrated predictive distributions while maintaining narrow prediction intervals and having performant continuous ranked probability scores. Besides probabilistic forecasting performance, we observe significant efficiency gains for the CCT- and CMC meta-learners compared to other conformal approaches that produce prediction intervals for ITE with coverage guarantees.
△ Less
Submitted 20 May, 2025; v1 submitted 7 February, 2024;
originally announced February 2024.
-
GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer
Authors:
Deressa Wodajo Deressa,
Hannes Mareen,
Peter Lambert,
Solomon Atnafu,
Zahid Akhtar,
Glenn Van Wallendael
Abstract:
Deepfakes have raised significant concerns due to their potential to spread false information and compromise digital media integrity. Current deepfake detection models often struggle to generalize across a diverse range of deepfake generation techniques and video content. In this work, we propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model comb…
▽ More
Deepfakes have raised significant concerns due to their potential to spread false information and compromise digital media integrity. Current deepfake detection models often struggle to generalize across a diverse range of deepfake generation techniques and video content. In this work, we propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model combines ConvNeXt and Swin Transformer models for feature extraction, and it utilizes Autoencoder and Variational Autoencoder to learn from the latent data distribution. By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos. The model is trained and evaluated on DFDC, FF++, TM, DeepfakeTIMIT, and Celeb-DF (v$2$) datasets. The proposed GenConViT model demonstrates strong performance in deepfake video detection, achieving high accuracy across the tested datasets. While our model shows promising results in deepfake video detection by leveraging visual and latent features, we demonstrate that further work is needed to improve its generalizability, i.e., when encountering out-of-distribution data. Our model provides an effective solution for identifying a wide range of fake videos while preserving media integrity. The open-source code for GenConViT is available at https://github.com/erprogs/GenConViT.
△ Less
Submitted 4 March, 2025; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Haptic Interactions for Extended Reality
Authors:
Yentl Vermeulen,
Sam Van Damme,
Glenn Van Wallendael,
Filip De Turck,
Maria Torres Vega
Abstract:
This research investigates whether the interaction methods of XR headsets can be improved by using haptic feedback. As a first and most common technique, indirect interactions are considered. Indirect interactions correspond to manipulations of virtual objects from a virtual distance using pre-defined hand gestures. As a second interaction technique, direct interaction (namely DIM) has been implem…
▽ More
This research investigates whether the interaction methods of XR headsets can be improved by using haptic feedback. As a first and most common technique, indirect interactions are considered. Indirect interactions correspond to manipulations of virtual objects from a virtual distance using pre-defined hand gestures. As a second interaction technique, direct interaction (namely DIM) has been implemented where the user manipulates objects by virtually touching these with their hands. A third interaction method extends the previous one with haptic feedback (namely HEDIM). These 3 methods are compared with each other based on objective and subjective user tests, also taking into account financial considerations. This research concludes that the DIM improves upon the standard indirect method. Additionally, it has been observed that haptic feedback could enhance the DIM in specific situations. Nevertheless, when considering the current financial cost, our subjects were not convinced of the small improvements haptic feedback brings.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Head Movement Modeling for Immersive Visualization in VR
Authors:
Glenn Van Wallendael,
Lucas Liegeois,
Julie Artois,
Peter Lambert
Abstract:
Virtual Reality, and Extended Reality in general, connect the physical body with the virtual world. Movement of our body translates to interactions with this virtual world. Only by moving our head will we see a different perspective. By doing so, the physical restrictions of our body's movement restrict our capabilities virtually. By modelling the capabilities of human movement, render engines can…
▽ More
Virtual Reality, and Extended Reality in general, connect the physical body with the virtual world. Movement of our body translates to interactions with this virtual world. Only by moving our head will we see a different perspective. By doing so, the physical restrictions of our body's movement restrict our capabilities virtually. By modelling the capabilities of human movement, render engines can get useful information to pre-cache visual texture information or immersive light information. Such pre-caching becomes vital due to ever increasing realism in virtual environments. This work is the first work to predict the volume in which the head will be positioned in the future based on a data-driven binned-ellipsoid technique. The proposed technique can reduce a 1m3 volume to a size of 10cm3 with negligible accuracy loss. This volume then provides the render engine with the necessary information to pre-cache visual data.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Training Data Improvement for Image Forgery Detection using Comprint
Authors:
Hannes Mareen,
Dante Vanden Bussche,
Glenn Van Wallendael,
Luisa Verdoliva,
Peter Lambert
Abstract:
Manipulated images are a threat to consumers worldwide, when they are used to spread disinformation. Therefore, Comprint enables forgery detection by utilizing JPEG-compression fingerprints. This paper evaluates the impact of the training set on Comprint's performance. Most interestingly, we found that including images compressed with low quality factors during training does not have a significant…
▽ More
Manipulated images are a threat to consumers worldwide, when they are used to spread disinformation. Therefore, Comprint enables forgery detection by utilizing JPEG-compression fingerprints. This paper evaluates the impact of the training set on Comprint's performance. Most interestingly, we found that including images compressed with low quality factors during training does not have a significant effect on the accuracy, whereas incorporating recompression boosts the robustness. As such, consumers can use Comprint on their smartphones to verify the authenticity of images.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Comprint: Image Forgery Detection and Localization using Compression Fingerprints
Authors:
Hannes Mareen,
Dante Vanden Bussche,
Fabrizio Guillaro,
Davide Cozzolino,
Glenn Van Wallendael,
Peter Lambert,
Luisa Verdoliva
Abstract:
Manipulation tools that realistically edit images are widely available, making it easy for anyone to create and spread misinformation. In an attempt to fight fake news, forgery detection and localization methods were designed. However, existing methods struggle to accurately reveal manipulations found in images on the internet, i.e., in the wild. That is because the type of forgery is typically un…
▽ More
Manipulation tools that realistically edit images are widely available, making it easy for anyone to create and spread misinformation. In an attempt to fight fake news, forgery detection and localization methods were designed. However, existing methods struggle to accurately reveal manipulations found in images on the internet, i.e., in the wild. That is because the type of forgery is typically unknown, in addition to the tampering traces being damaged by recompression. This paper presents Comprint, a novel forgery detection and localization method based on the compression fingerprint or comprint. It is trained on pristine data only, providing generalization to detect different types of manipulation. Additionally, we propose a fusion of Comprint with the state-of-the-art Noiseprint, which utilizes a complementary camera model fingerprint. We carry out an extensive experimental analysis and demonstrate that Comprint has a high level of accuracy on five evaluation datasets that represent a wide range of manipulation types, mimicking in-the-wild circumstances. Most notably, the proposed fusion significantly outperforms state-of-the-art reference methods. As such, Comprint and the fusion Comprint+Noiseprint represent a promising forensics tool to analyze in-the-wild tampered images.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
SILVR: A Synthetic Immersive Large-Volume Plenoptic Dataset
Authors:
Martijn Courteaux,
Julie Artois,
Stijn De Pauw,
Peter Lambert,
Glenn Van Wallendael
Abstract:
In six-degrees-of-freedom light-field (LF) experiences, the viewer's freedom is limited by the extent to which the plenoptic function was sampled. Existing LF datasets represent only small portions of the plenoptic function, such that they either cover a small volume, or they have limited field of view. Therefore, we propose a new LF image dataset "SILVR" that allows for six-degrees-of-freedom nav…
▽ More
In six-degrees-of-freedom light-field (LF) experiences, the viewer's freedom is limited by the extent to which the plenoptic function was sampled. Existing LF datasets represent only small portions of the plenoptic function, such that they either cover a small volume, or they have limited field of view. Therefore, we propose a new LF image dataset "SILVR" that allows for six-degrees-of-freedom navigation in much larger volumes while maintaining full panoramic field of view. We rendered three different virtual scenes in various configurations, where the number of views ranges from 642 to 2226. One of these scenes (called Zen Garden) is a novel scene, and is made publicly available. We chose to position the virtual cameras closely together in large cuboid and spherical organisations ($2.2m^3$ to $48m^3$), equipped with 180° fish-eye lenses. Every view is rendered to a color image and depth map of 2048px $\times$ 2048px. Additionally, we present the software used to automate the multi-view rendering process, as well as a lens-reprojection tool that converts between images with panoramic or fish-eye projection to a standard rectilinear (i.e., perspective) projection. Finally, we demonstrate how the proposed dataset and software can be used to evaluate LF coding/rendering techniques(in this case for training NeRFs with instant-ngp). As such, we provide the first publicly-available LF dataset for large volumes of light with full panoramic field of view
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Art and Science Interaction Lab -- A highly flexible and modular interaction science research facility
Authors:
Niels Van Kets,
Bart Moens,
Klaas Bombeke,
Wouter Durnez,
Pieter-Jan Maes,
Glenn Van Wallendael,
Lieven De Marez,
Marc Leman,
Peter Lambert
Abstract:
The Art and Science Interaction Lab (ASIL) is a unique, highly flexible and modular interaction science research facility to effectively bring, analyse and test experiences and interactions in mixed virtual/augmented contexts as well as to conduct research on next-gen immersive technologies. It brings together the expertise and creativity of engineers, performers, designers and scientists creating…
▽ More
The Art and Science Interaction Lab (ASIL) is a unique, highly flexible and modular interaction science research facility to effectively bring, analyse and test experiences and interactions in mixed virtual/augmented contexts as well as to conduct research on next-gen immersive technologies. It brings together the expertise and creativity of engineers, performers, designers and scientists creating solutions and experiences shaping the lives of people. The lab is equipped with state-of-the-art visual, auditory and user-tracking equipment, fully synchronized and connected to a central backend. This synchronization allows for highly accurate multi-sensor measurements and analysis.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
Network-Distributed Video Coding
Authors:
Johan De Praeter,
Christopher Hollmann,
Rickard Sjoberg,
Glenn Van Wallendael,
Peter Lambert
Abstract:
Nowadays, an enormous amount of videos are streamed every day to countless users, all using different devices and networks. These videos must be adapted in order to provide users with the most suitable video representation based on their device properties and current network conditions. However, the two most common techniques for video adaptation, simulcast and transcoding, represent two extremes.…
▽ More
Nowadays, an enormous amount of videos are streamed every day to countless users, all using different devices and networks. These videos must be adapted in order to provide users with the most suitable video representation based on their device properties and current network conditions. However, the two most common techniques for video adaptation, simulcast and transcoding, represent two extremes. The former offers excellent scalability, but requires a large amount of storage, while the latter has a small storage cost, but is not scalable to many users due to the additional computing cost per requested representation. As a third, in-between approach, network-distributed video coding (NDVC) was proposed within the Moving Picture Experts Group (MPEG). The aim of NDVC is to reduce the storage cost compared to simulcast, while retaining a smaller computing cost compared to transcoding. By exploring the proposed techniques for NDVC, we show the workings of this third option for video providers to deliver their contents to their clients.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Exploratory Study on User's Dynamic Visual Acuity and Quality Perception of Impaired Images
Authors:
Jolien De Letter,
Anissa All,
Lieven De Marez,
Vasileios Avramelos,
Peter Lambert,
Glenn Van Wallendael
Abstract:
In this paper we assess the impact of head movement on user's visual acuity and their quality perception of impaired images. There are physical limitations on the amount of visual information a person can perceive and physical limitations regarding the speed at which our body, and as a consequence our head, can explore a scene. In these limitations lie fundamental solutions for the communication o…
▽ More
In this paper we assess the impact of head movement on user's visual acuity and their quality perception of impaired images. There are physical limitations on the amount of visual information a person can perceive and physical limitations regarding the speed at which our body, and as a consequence our head, can explore a scene. In these limitations lie fundamental solutions for the communication of multimedia systems. As such, subjects were asked to evaluate the perceptual quality of static images presented on a TV screen while their head was in a dynamic (moving) state. The idea is potentially applicable to virtual reality applications and therefore, we also measured the image quality perception of each subject on a head mounted display. Experiments show the significant decrease in visual acuity and quality perception when the user's head is not static, and give an indication on how much the quality can be reduced without the user noticing any impairments.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.